Memory DoS Attacks in Multi-tenant Clouds: Severity and Mitigation

Memory DoS Attacks in Multi-tenant Clouds: Severity and Mitigation


In cloud computing, network Denial of Service (DoS) attacks are well studied and defenses have been implemented, but severe DoS attacks on a victim’s working memory by a single hostile VM are not well understood. Memory DoS attacks are Denial of Service (or Degradation of Service) attacks caused by contention for hardware memory resources on a cloud server. Despite the strong memory isolation techniques for virtual machines (VMs) enforced by the software virtualization layer in cloud servers, the underlying hardware memory layers are still shared by the VMs and can be exploited by a clever attacker in a hostile VM co-located on the same server as the victim VM, denying the victim the working memory he needs. We first show quantitatively the severity of contention on different memory resources. We then show that a malicious cloud customer can mount low-cost attacks to cause severe performance degradation for a Hadoop distributed application, and 38 delay in response time for an E-commerce website in the Amazon EC2 cloud.

Then, we design an effective, new defense against these memory DoS attacks, using a statistical metric to detect their existence and execution throttling to mitigate the attack damage. We achieve this by a novel re-purposing of existing hardware performance counters and duty cycle modulation for security, rather than for improving performance or power consumption. We implement a full prototype on the OpenStack cloud system. Our evaluations show that this defense system can effectively defeat memory DoS attacks with negligible performance overhead.





1 Introduction

Public Infrastructure-as-a-Service (IaaS) clouds provide elastic computing resources on demand to customers at low cost. Anyone with a credit card may host scalable applications in these computing environments, and become a tenant of the cloud. To maximize resource utilization, cloud providers schedule virtual machines (VMs) leased by different tenants on the same physical machine, sharing the same hardware resources.

While software isolation techniques, like VM virtualization, carefully isolate memory pages (virtual and physical), most of the underlying hardware memory hierarchy is still shared by all VMs running on the same physical machine in a multi-tenant cloud environment. Malicious VMs can exploit the multi-tenancy feature to intentionally cause severe contention on the shared memory resources to conduct Denial-of-Service (DoS) attacks against other VMs sharing the resources. Moreover, it has been shown that a malicious cloud customer can intentionally co-locate his VMs with victim VMs to run on the same physical machine [36, 39, 46]; this co-location attack can serve as a first step for performing memory DoS attacks against an arbitrary target.

The severity of memory resource contention has been seriously underestimated. While it is temping to presume the level of interference caused by resource contention is modest, and in the worst case, the resulting performance degradation is isolated on one compute node, we show this is not the case. We present advanced attack techniques that, when exploited by malicious VMs, can induce much more intense memory contention than normal applications could do, and can degrade the performance of VMs on multiple nodes.

To demonstrate that our attacks work on real applications in real-world settings, we applied them to two case studies conducted in a commercial IaaS cloud, Amazon Elastic Compute Cloud (EC2). We show that even if the attacker only has one VM co-located with one of the many VMs of the target multi-node application, significant performance degradation can be caused to the entire application, rather than just to a single node. In our first case study, we show that when the adversary co-locates one VM with one node of a 20-node distributed Hadoop application, he may cause up to 3.7 slowdown of the entire distributed application. Our second case study shows that our attacks can slow down the response latency of an E-commerce application (consisting of load balancers, web servers, database servers and memory caching servers) by up to 38 times, and reduce the throughput of the servers down to 13%.

Despite the severity of the attacks, neither current cloud providers nor research literature offer any solutions to memory DoS attacks. Our communication with cloud providers suggests such issues are not currently addressed, in part because the attack techniques presented in this paper are non-conventional, and existing solutions to network-based DDoS attacks do not help. Research studies have not explored defenses against adversarial memory contention either. As will be discussed in Sec. 6.2, existing solutions [19, 53, 47, 55, 13] only aim to enhance performance isolation between benign applications. Intentional memory abuses that are evident in memory DoS attacks are immune to these solutions.

Therefore, a large portion of this paper is devoted to the design and implementation of a novel and effective approach to detect and mitigate all known types of memory DoS attacks with low-cost overhead. Our detection strategy provides a generalized method for detecting deviations from the baseline behavior of the victim VM due to memory DoS attacks. We collect the baseline behaviors of the monitored VM at runtime, by creating a pseudo isolated period, without completely pausing co-tenant VMs. This provides periodic (re)establishment of baseline behaviors that adapt to changes in program phases and workload characteristics. Once memory DoS attacks are detected, we show how malicious VMs can be identified and their attacks mitigated, using a novel form of selective execution throttling.

We implemented a prototype of our defense solution on the opensource OpenStack cloud software, and extensively evaluated its effectiveness and efficiency. Our evaluation shows that we can accurately detect memory DoS attacks and promptly and effectively mitigate the attacks. The performance overhead of persistent performance monitoring is lower than 5%, which is low enough to be used in production public clouds. Because our solution does not require modifications of CPU hardware, hypervisor or guest operating systems, it minimally impacts the existing cloud implementations. Therefore, we envision our solution can be rapidly deployed in public clouds as a new security service to customers who require higher security assurances (like in Security-on-Demand clouds [24, 48]).

In summary, the contributions of this paper are:

  • A set of attack techniques to perform memory DoS attacks. Measurement of the severity of the resulting Degradation of Service (DoS) to the victim VM.

  • Demonstration of the severity of memory DoS attacks in public clouds (Amazon EC2) against Hadoop applications and E-commerce websites.

  • A novel, generalizable, attack detection method to detect abnormal probability distribution deviations at runtime, that adapts to program phase changes and different workload inputs.

  • A novel method for detecting the attack VM using selective execution throttling.

  • A new, rapidly deployable, defense for all memory DoS attacks with accurate detection and low-overhead, using existing hardware processor features.

We first discuss our threat model and background of memory resources in Sec. 2. Techniques to perform memory DoS attacks are presented in Sec. 3. We show the power of these attacks in two case studies conducted in Amazon EC2 in Sec. 4. Our new defense techniques are described and evaluated in Sec. 5. We summarize related work in Sec. 6 and conclude in Sec. 7.

2 Background

2.1 Threat Model and Assumptions

We consider security threats from malicious tenants of public IaaS clouds. We assume the adversary has the ability to launch at least one VM on the cloud servers on which the victim VMs are running. Techniques required to do so have been studied [36, 39, 46], and are orthogonal to our work. The adversary can run any program inside his own VM. We do not assume that the adversary can send network packets to the victim directly, thus resource freeing attacks [38] or network-based DoS attacks [28] are not applicable. We do not consider attacks from the cloud providers, or any attacks requiring direct control of privileged software.

We assume the software and hardware isolation mechanisms function correctly as designed. A hypervisor virtualizes and manages the hardware resources (see Figure 1) so that each VM thinks it has the entire computer. A server can have multiple processor packages, where all processor cores in a package share a Last Level Cache (LLC), while L1 and L2 caches are private to a processor core and not shared by different cores. All processor packages share the Integrated Memory Controller (IMC), the inter-package bus and the main memory storage (DRAM chips). Each VM is designated a disjoint set of virtual CPUs (vCPU), which can be scheduled to operate on any physical cores based on the hypervisor’s scheduling algorithms. A program running on a vCPU may use all the hardware resources available to the physical core it runs on. Hence, different VMs may simultaneously share the same hardware caches, buses, memory channels and DRAM bank buffers. We assume the cloud provider may schedule VMs from different customers on the same server (as co-tenants), but likely on different physical cores. As is the case today, software-based VM isolation by the hypervisor only isolates accesses to virtual and physical memory pages, but not to the underlying hardware memory resources shared by the physical cores.

Figure 1: An attacker VM (with 2 vCPUs) and a victim VM share multiple layers of memory resources.

2.2 Hardware Memory Resources

Figure 2: Shared storage-based and scheduling-based hardware memory resources in multi-core cloud servers.

Figure 2 shows the hardware memory resources in modern computers. Using Intel processors as examples, modern X86-64 processors usually consist of multiple processor packages, each of which consists of several physical processor cores. Each physical core can execute one or two hardware threads in parallel with the support of Hyper-Threading Technology. A hierarchical memory subsystem, from the top to the bottom, is composed of different levels of storage-based components (e.g., caches, the DRAMs). These memory components are inter-connected by a variety of scheduling-based components (e.g., memory buses and controllers), with various schedulers arbitrating their communications. Memory resources shared by different cores are described below:

Last Level Caches (LLC).  An LLC is shared by all cores in one package (older processors may have one package supported by multiple LLCs). Intel LLCs usually adopt an inclusive cache policy: every cache line maintained in the upper-level caches (i.e., core-private Level 1 and Level 2 caches in each core - not shown in Figure 2) also has a copy in the LLC. In other words, when a cache line in the LLC is evicted, so are the copies in the upper-level caches. A subsequent access to the memory block mapped to this cache line will result in an LLC miss, which will lead to the much slower main memory access. On recent Intel processors (since Nehalem), LLCs are split into multiple slices, each of which is associated with one physical core, although every core may use the entire LLC. Intel employs static hash mapping algorithms to translate the physical address of a cache line to one of the LLC slices that contains the cache line. These mappings are unique for each processor model and are not released to the public. So it is harder for attackers to generate LLC contention using the method from [43].

Memory Buses.  Intel uses a ring bus topology to interconnect components in the processor package, e.g., processor cores, LLC slices, Integrated Memory Controllers (IMCs), QuickPath Interconnect (QPI) agents, etc. The high-speed QPI provides point-to-point interconnections between different processor packages, and between each processor package and I/O devices. The memory controller bus connects the LLC slices to the bank schedulers in the IMC, and the DRAM bus connects the IMC’s schedulers to the DRAM banks. Current memory bus designs with high bandwidth make it very difficult for attackers to saturate the memory buses. Also, elimination of bus locking operations for normal atomic operations make bus locking attacks via normal atomic operations (e.g., [43]) less effective. However, some exotic atomic bus locking operations still exist.

DRAM banks.  Each DRAM package consists of several banks, each of which can be thought of as a two dimensional data array with multiple rows and columns. Each bank has a bank buffer to hold the most recently used row to speed up DRAM accesses. A memory access to a DRAM bank may either be served in the bank buffer, which is a buffer-hit (fast), or in the bank itself, which is a buffer-miss (slow).

Integrated Memory Controllers (IMC).  Each processor package contains one or multiple IMCs. The communications between an IMC and the portion of DRAM it controls are supported by multiple memory channels, each of which serves a set of DRAM banks. When the processor wants to access the data in the DRAM, it first calculates the bank that stores the data based on the physical address, then it sends the memory request to the IMC that controls the bank. The processor can request data from the IMC in the local package, as well as in a different package via QPI. The IMCs implement a bank priority queue for each bank they serve, to buffer the memory requests to this bank. A bank scheduler is used to schedule requests in the bank priority queue, typically using a First-Ready-First-Come-First-Serve algorithm that gives high scheduling priority to the request that leads to a buffer-hit in the DRAM bank buffer, and then to the request that arrived earliest. Once requests are scheduled by the bank scheduler, a channel scheduler will further schedule them, among requests from other bank schedulers, to multiplex the requests onto a shared memory channel. The channel scheduler usually adopts a First-Come-First-Serve algorithm, which favors the earlier requests. Modern DRAM and IMCs can handle a large amount of requests concurrently, so the it is less effective to flood the DRAM and IMCs to generate severe contention, as shown in [32].

3 Memory DoS Attacks

3.1 Fundamental Attack Strategies

We have classified all memory resources into either storage-based or scheduling-based resources. This helps us formulate the following two fundamental attack strategies for memory DoS attacks:

  • Storage-based contention attack. The fundamental attack strategy to cause contention on storage-based resources is to reduce the probability that the victim’s data is found in an upper-level memory resource (faster), thus forcing it to fetch the data from a lower-level resource (slower).

  • Scheduling-based contention attack. The fundamental attack strategy on a scheduling-based resource is to decrease the probability that the victim’s requests are selected by the scheduler, e.g., by locking the scheduling-based resources temporarily, tricking the scheduler to improve the priority of the attacker’s requests, or overwhelming the scheduler by submitting a huge amount of requests simultaneously.

We systematically show how memory DoS attacks can be constructed on different layers of memory resources (LLC in Sec. 3.2, bus in Sec. 3.3, memory controller and DRAM in Sec. 3.4). For each memory component, we first study the basic techniques the attacker can use to generate resource contention and affect the victim’s performance. We measure the effectiveness of the attack techniques on the victim VM with different vCPU locations and program features. Then we propose some practical attacks and evaluate their impacts on real-world benchmarks.

Testbed configuration.  To demonstrate the severity of different types of memory DoS attacks, we use a server configuration, representative of many cloud servers, configured as shown in Table 1. We use Intel processors, since they are the most common in cloud servers, but the attack methods we propose are general, and applicable to other processors and platforms as well.

Server Dell PowerEdge R720
Processor Packages Two 2.9 Intel Xeon E5-2667 (Sandy Bridge)
Cores per Package 6 physical cores, or 12 hardware threads with Hyper-Threading
Core-private L1 I and L1 D: each 32, 8-way set-associative;
Level 1 and Level 2 caches L2 cache: 256, 8-way set-associative
Last Level Cache (LLC) 15, 20-way set-associative, shared by cores in package, divided
into 6 slices of 2.5 each; one slice per core
Physical memory Eight 8 DRAMs, divided into 8 channels, and 1024 banks
Hypervisor Xen version 4.1.0
VM’s OS Ubuntu 12.04 Linux, with 3.13 kernel
Table 1: Testbed Configuration

In each of the following experiments, we launched two VMs, one as the attacker and the other as the victim. By default, each VM was assigned a single vCPU. We select a mix set of benchmarks for the victim: (1) We use a modified stream program [31, 32] as a micro benchmark to explore the effectiveness of the attacks on victims with different features. This program allocates two array buffers with the same size, one as the source and the other as the destination. It copies data from the source to the destination in loops repeatedly, either in a sequential manner (resulting a program with high memory locality) or in a random manner (low memory locality). We chose this benchmark because it is memory-intensive and allows us to alter the size of memory footprints and the locality of memory resources. (2) To fully evaluate the attack effects on real-world applications, we choose 8 macro benchmarks (6 from SPEC2006 [10] and 2 from PARSEC [16]) and cryptographic applications based on OpenSSL as the victim program. Each experiment was repeated 10 times, and the mean values and standard deviations are reported.

3.2 Cache Contention (Storage Resources)

Of the storage-based contention attacks, we found that the LLC contention results in the most severe performance degradation. The root vulnerability is that an LLC is shared by all cores of the same CPU package, without access control or quota enforcement. Therefore a program in one VM can evict LLC cache lines belonging to another VM. Moreover, inclusive LLCs (e.g., most modern Intel LLCs) will propagate these cache line evictions to core-private L1 and L2 caches, further aggravating the interference between programs (or VMs) in CPU caches.

Contention Study

Cache cleansing.  To cause LLC contention, the adversary can allocate a memory buffer to cover the entire LLC. By accessing one memory address per memory block in the buffer, the adversary can cleanse the entire cache and evict all of the victim’s data from the LLC to the DRAM.

The optimal buffer used by the attacker should exactly map to the LLC, which means it can fill up each cache set in each LLC slice without self-conflicts (i.e., evicting earlier lines loaded from this buffer). For example, for a LLC with slices, sets in each slice, and -way set-associativity, the attacker would like memory blocks to cover all cache lines of all sets in all slices. There are two challenges that make this task difficult for the attacker: the host physical addresses of the buffer to index the cache slice are unknown to the attacker, and the mapping from physical memory addresses to LLC slices is not publicly known.

Mapping LLC cache slices: To overcome these challenges, the attacker can first allocate a 1 Hugepage which is guaranteed to have continuous host physical addresses; thus he need not worry about virtual to physical page translations which he does not know. Then for each LLC cache set in all slices, the attacker sets up an empty group , and starts the following loop: (i) select block from the Hugepage, which is mapped to set by the same index bits in the memory address; (ii) add to ; (iii) access all the blocks in ; and (iv) measure the average access latency per block. A longer latency indicates block causes self-conflict with other blocks in , so it is removed from . The above loop is repeated until there are blocks in , which can exactly fill up set in all slices. Next the attacker needs to distinguish which blocks in belong to each slice: He first selects a new block mapped to set from the Hugepage, and adds it to . This should cause a self-conflict. Then he executes the following loop: (i) select one block from ; (ii) remove it from ; (iii) access all the blocks in ; and (iv) measure the average access latency. A short latency indicates block can eliminate the self-conflict caused by , so it belongs to the same slice as . The attacker keeps doing this until he discovers blocks that belong to the same slice as . These blocks form the group that can fill up set in one slice. The above procedure is repeated till the blocks in are divided into slices for set . After conducting the above process for each cache set, the attacker obtains a memory buffer with non-consecutive blocks that map exactly to the LLC. Such self-conflict elimination is also useful in improving side-channel attacks [27].

To test the effectiveness of cache cleansing, we arranged the attacker VM and victim VM on the same processor package, thus sharing the LLC and all memory resources in lower layers. The adversary first identified the memory buffer that maps to the LLC. Then he cleansed the whole LLC repeatedly. The resulting performance degradation of the victim application is shown in Figure 3. The victim suffered from the most significant performance degradation when the victim’s buffer size is around 10 (1.8 slowdown for the high locality program, and 5.5 slowdown for the low locality program). When the buffer size is smaller than 5 (data are stored mainly in upper-level caches), or the size is larger than 25 (data are stored mainly in the DRAM), the impact of cache contention on LLC is negligible.

Figure 3: Performance slowdown due to LLC cleansing contention. We use “H-x” or “L-x” to denote the victim program has high or low memory locality and has a buffer size of x.

The results can be explained as follows: the maximum performance degradation can be achieved on victims with memory footprint smaller than, but close to, the LLC size, which is 15, because the victim suffers the least from self conflicts in LLC and the most from the attacker’s LLC cleansing. Moreover, as a low locality program accesses its data in a random order, hardware prefetching is less effective in enhancing the program’s access speed. So the program accesses the cache at a relatively lower rate. Its data will be evicted out of the LLC by the attacker with higher probability. That is why the LLC cleansing has a larger impact on low locality programs than on high locality programs.

Takeaways.  LLC contention is (more) effective when (1) the attacker and victim VMs share the same LLC, (2) the victim program’s memory footprint is about the size of LLC, and (3) the victim program has lower memory locality.

Practical Attack Evaluation

We improve this attack by increasing the cleansing speed, and the accuracy of evicting (thus contending with) the victim’s data.

Multi-threaded LLC cleansing.  To speed up the LLC cleansing, the adversary may split the cleansing task into threads, with each running on a separate vCPU and cleansing only a non-overlapping of the LLC simultaneously. This effectively increases the cleansing speed by times.

In our experiment, the attacker VM and the victim VM were arranged to share the LLC. The attacker VM was assigned 4 vCPUs. It first prepared the memory buffer that exactly mapped to the LLC. Then he cleansed the LLC with (1) one vCPU; (2) 4 vCPUs (each cleansing 1/4 of the LLC). Figure 4 shows that the attack can cause slowdown to the victim VM when using one thread, and slowdown when using four threads.

Figure 4: Performance slowdown due to multi-threaded LLC cleansing attack

Adaptive LLC cleansing.  The basic LLC cache cleansing technique does not work when the victim’s program has a memory footprint (1) that is much smaller than an LLC (e.g., 15), since it takes a long time to finish one complete LLC cleansing, where most of the memory accesses do not induce contention with the victim. To achieve finer-grained attacks, we developed a cache probing technique to pinpoint the cache sets in the LLC that map to the victim’s memory footprint, and cleanse only these selected sets.

The attacker first allocates a memory buffer covering the entire LLC in his own VM. Then he conducts cache probing in two steps: (1) In the Discover Stage, while the victim program runs, for each cache set, the attacker accesses some cache lines belonging to this set and figures out the maximum number of cache lines which can be accessed without causing cache conflicts. If this number is smaller than the set associativity, this cache set will be selected to conduct adaptive cleansing attacks, because the victim has frequently occupied some cache lines in this set; (2) In the Attack Stage, the attacker keeps accessing these selected cache sets to cleanse the victim’s data. Algorithm 4 shows the steps to perform the adaptive LLC cleansing.


[t] \SetAlgoLined\KwIn \Indpcache_set{}: all the sets in the LLC
cache_buffer{}: cover the entire LLC
cache_assoc_num: the associativity of LLC

/* Discover Stage */
\Foreach set i in cache_set{} Find out j, s.t., accessing j cache lines in set i from cache_buffer{} has no cache conflict (low accessing time), but accessing j+1 cache lines in set i from cache_buffer{} has cache conflict (high accessing time)
\Ifjcache_assoc_num add i to victim_set{}
\BlankLine\BlankLine/* Attack Stage: */
\Whileattack is not finished \Foreach set i in victim_set{} access cache_assoc_num cache lines in set i from cache_buffer{} Adaptive LLC cleansing

Figure 5 shows the results of the attacker’s multi-threaded adaptive cleansing attacks against victim applications with cryptographic operations. While the basic cleansing did not have any effect, the adaptive attacks can achieve around 1.12 to 1.4 times runtime slowdown with 1 vCPU, and up to 4.4 slowdown with 4 vCPUs.

Figure 5: Performance slowdown due to adaptive LLC cleansing attacks

3.3 Bus Contention (Scheduling Resources)

The availability of internal memory buses can be compromised by overwhelming or temporarily locking down the buses. We study the effects of these techniques.

Contention Study

Bus saturation.  One intuitive approach for an adversary is to create numerous memory requests to saturate the buses [43]. However, the bus bandwidth in modern processors may be too high for a single VM to saturate.

To examine the effectiveness of bus saturation contention, we conducted two sets of experiments. In the first set of experiments, the victim VM and the attacker VM were located in the same processor package but on different physical cores (Figure a). They accessed different parts of the LLC, without touching the DRAM. Therefore the attacker VM causes contention in the ring bus that connects LLC slices without causing contention in the LLC itself. In the second set of experiments, the victim VM and the attacker VM were pinned on different processor packages (Figure b). They accessed different memory channels, without inducing contention in the memory controller and DRAM modules. Therefore the attacker and victim VMs only contend in buses that connect LLCs and IMCs, as well as the QPI buses. The attacker and victim were assigned increasing number of vCPUs to cause more bus traffic. Results in Figure 6 show that these buses were hardly saturated and the impact on the victim’s performance was negligible in all cases.

Figure 6: Performance slowdown due to bus saturation contention. We use “H-xc” or “L-xc” to denote the configuration that the victim program has high or low locality, and both of the attacker and victim use x cores to contend for the bus.

Bus locking.  To deny the victim from being scheduled by a scheduling resource, the adversary can temporarily lock down the internal memory buses. Intel processors provide locked atomic operations for managing shared data structures between multi-processors [6]. Before Intel Pentium (P5) processors, the locked atomic operations always generate LOCK signals on the internal buses to achieve operation atomicity. So other memory accesses are blocked until the locked atomic operation is completed. For processor families after P6, the bus lock is transformed into a cache lock: the cache line is locked instead of the bus and the cache coherency mechanism is used to ensure operation atomicity. This causes much smaller scheduling lockdown times.

However, we have found two exotic atomic operations the adversary can still use to lock the internal memory buses: (1) Locked atomic accesses to unaligned memory blocks: the processor has to fetch two adjacent cache lines to complete this unaligned memory access. To guarantee the atomicity of accessing the two adjacent cache lines, the processors will flush in-flight memory accesses issued before, and block memory accesses to the bus, until the unaligned memory access is finished. (2) Locked atomic accesses to uncacheable memory blocks: when uncached memory pages are accessed in atomic operations, the cache coherency mechanism does not work. Hence, the memory bus must be locked to guarantee atomicity. Listings 1 and 2 show the codes for issuing unaligned and uncached atomic operations. The two programs keep conducting the addition operation of a constant (x) and a memory block (block_addr) (line 5 – 11): in line 7, the lock prefix indicates this operation is atomic. The instruction xaddl indicates this is an addition operation. The first operand is the register eax, which stores x (line 9). The second operand is the first parameter of line 9 (data denoted by the address block_addr). The results will be loaded to the register eax (line 8). In Listing 1, we set this memory block as unaligned (line 4). In Listing 2, we added a new system call to set the page table entries of the memory buffer as cache disabled (line 2).

3  int x = 0x0;
4  int *block_addr = (int *)(buffer+CACHE_LINE_SIZE-1);
5  while (1) {
6    __asm__(
7      "lock; xaddl %%eax, %1\n\t"
8      :"=a"(x)
9      :"m"(*block_addr), "a"(x)
10      :"memory");
11  }
Listing 1: Attack using unaligned atomic operations
2  syscall(__NR_UnCached, (unsigned long)buffer);
3  int x = 0x0;
4  int *block_addr = (int *)buffer;
5  while (1) {
6    __asm__(
7    "lock; xaddl %%eax, %1\n\t"
8    :"=a"(x)
9    :"m"(*block_addr), "a"(x)
10    :"memory");
11  }
Listing 2: Attack using uncached atomic operations

To evaluate the effects of bus locking contention, we chose the footprint size of the victim program as (1) 8, with which the L1 cache was under-utilized, (2) 64, with which the L1 cache was over-utilized but the L2 cache was under-utilized, (3) 512, with which the L2 cache was over-utilized but the LLC was under-utilized, and (4) 30, with which the LLC was over-utilized. The attacker VM kept issuing unaligned atomic or uncached atomic memory accesses to lock the memory buses. For comparison, we also run another group of experiments, where the attacker kept issuing normal locked memory accesses. We considered two scenarios: (1) the attacker and victim shared the same processor package, but run on different cores; (2) they were scheduled on different processor packages. The normalized execution time of the victim program is shown in Figure 7.

Figure 7: Performance slowdown due to bus locking contention. We use “H-x” or “L-x” to denote the victim program has high or low memory locality and has a buffer size of x.

We observe that the victim’s performance was significantly affected when the its buffer size was larger than the L2 caches. This is because the attacker who kept requesting atomic, unaligned memory accesses was only able to lock the buses within its physical cores, the ring buses around the LLCs in each package, the QPI, and the buses from each package to the DRAM. So when the victim’s buffer size was smaller than LLC, it fetched data from the private caches in its own core without being affected by the attacker. However, when the victim’s buffer size was larger than the L2 caches, its access to the LLC would be delayed by the bus locking operations, and the performance is degraded (up to slowdown for high locality victim programs and slowdown for low locality victim programs).

Takeaways.  We explored two approaches to bus contention. Saturating internal buses is unlikely to cause noticeable performance degradation. Bus locking shows promise when the victim program makes heavy use of the shared LLC or lower layer memory resources, whenever the victim VM and attacker VM are on the same processor package or different packages.

Practical Attack Evaluation

To evaluate the effectiveness of atomic locking attacks on real-world applications, we scheduled the attacker VM and victim VM on different processor packages. The attacker VM kept generating atomic locking signals by (1) requesting unaligned atomic memory accesses, or (2) requesting uncached atomic memory accesses. The normalized execution time of the victim program is shown in Figure 8. We observe that the victim’s performance can be degraded as much as 7 times when the attacker conducted exotic atomic operations.

Figure 8: Performance slowdown due to bus locking attacks.

3.4 Memory Contention (Combined Resources)

An IMC uses the bank scheduler and channel scheduler to select the memory requests for each DRAM access. Therefore an adversary may contend on these two schedulers by frequently issuing memory requests that result in bank buffer hits to boost his priority in the scheduler. Moreover, each memory bank is equipped with only one bank buffer to hold the recently used bank row, so the adversary can easily induce storage-based contention on bank buffers by frequently occupying them with his own data.

Contention Study

Memory flooding.  Since channel and bank schedulers use First-Come-First-Serve policies, an attacker can send a large amount of memory requests to flood the target memory channels or DRAM banks. These requests will contend on the scheduling-based resources with the victim’s memory requests. In addition, the attacker can issue requests in sequential order to achieve high row-hit locality and thus high priority in the bank scheduler, to further increase the effect of flooding. Furthermore, when the adversary keeps flooding the IMCs, these memory requests can also evict the victim’s data out of the DRAM bank buffers. The victim’s bank buffer hit rate is decreased and its performance is further degraded.

To demonstrate the effects of DRAM contention, we configure one attacker VM to operate a memory flooding program, which kept accessing memory blocks in the same DRAM bank directly without going through caches (i.e., uncached accesses). The victim VM did exactly the same with either high or low memory locality. We conducted two sets of experiments: (1) The two VMs access the same bank in the same channel (Same bank in Figure 9); (2) the two VMs access two different banks in the same channel (Same channel in Figure 9). To alter the memory request rate issued by the two VMs, we also changed the number of vCPUs in the attacker and victim VMs. The normalized execution time of the victim program is shown in Figure 9.

Three types of contention were observed in these experiments. First, channel scheduling contention was observed when the attacker and the victim access different banks in the same channel. It was enhanced with increased number of attacker and victim vCPUs, thus increasing the memory request rate (around slowdown for “H-5c” and “L-5c”). Second, bank scheduling contention was also observed when the attacker and victim accessed the same DRAM bank. When the memory request rate was increased, the victim’s performance was further degraded by an additional 70% and 25% for “H-5c” and “L-5c”, respectively. Third, contention in DRAM bank buffers was observed when we compare the results of “Same bank” in Figure 9 between high locality and low locality victim programs —low locality victims already suffer from row-misses and the additional performance degradation in high locality victims is due to bank buffer contention ( slowdown for H_5c verses slowdown for L_5c).

Figure 9: Performance degradation due to memory channel and bank contention. We use “H-xc” or “L-xc” to denote the configuration that the victim program has high or low locality, and both of the attacker and victim use x cores to contend for the bus.

We consider the overall effect of memory flooding contention. In this experiment, the victim VM runs a high locality or low locality stream benchmark on its only vCPU. The attacker VM allocates a memory buffer with the size that of the LLC and runs a stream program which keeps accessing memory blocks sequentially in this buffer to generate contention in every channel and every bank. To increase bus traffic, the attacker employed multiple vCPUs to perform the attack simultaneously. The performance degradation, as we can see in Figure 10, was significant when the victim’s memory accesses footprint was mostly in the DRAM, and more vCPUs of the attacker VM were used in the attack. The attacker can use 8 vCPUs to induce about slowdown to the victim with the buffer size larger than LLC.

Figure 10: Performance degradation due to memory flooding contention. We use “H-x” or “L-x” to denote the victim program has high or low memory locality and has a buffer size of x.

Takeaways.  Contention can be induced in channel schedulers, bank schedulers and bank buffers between different programs from different processor packages. This contention is especially significant when the victim program’s memory footprint is larger than the LLC.

Practical Attack Evaluation

We evaluate two advanced memory flooding attacks.

Multi-threaded memory flooding.  The attacker can use more threads to increase the memory flooding speed, as we demonstrated in the previous section. We evaluated this attack on real-world applications. The attacker and victim VMs are located in two different processor packages, so they only share the IMCs and DRAM. The attacker VM issues frequent, highly localized memory requests to flood every DRAM bank and every channel. To increase bus traffic, the attacker employed 8 vCPUs to perform the attack simultaneously. Figure 11 shows that the victim experiences up to a runtime slowdown when the attacker uses 8 vCPUs to generate contention (Complete Memory Flooding bars).

Figure 11: Performance overhead due to multi-threaded and adaptive memory flooding attacks.

Adaptive memory flooding.  For a software program with smaller memory footprint, only a few memory channels will be involved in its memory accesses. We developed a novel approach with which an adversary may identify memory channels that are more frequently used by a victim program. To achieve this, the attacker needs to reverse engineer the unrevealed algorithms that map the physical memory addresses to memory banks and channels, in order to accurately direct the flows of the memory request flood.

Mapping DRAM banks and channels: The attacker can leverage methods due to Liu et al. [29] to identify the bits in physical memory addresses that index the DRAM banks. The attacker first allocates a 1 Hugepage with continuous physical addresses, which avoids the unknown translations from guest virtual addresses to machine physical addresses. Then he selects two memory blocks from the Hugepage whose physical addresses differ in only one bit. He then flushes these two blocks out of caches and accesses them from the DRAM alternatively. A low latency indicates these two memory blocks are served in two banks as there is no contention on bank buffers. In this way, the attacker is able to identify all the bank bits. Next, the attacker needs to identify the channel bits among the bank bits. We design a novel algorithm which is shown in Algorithm 11 to achieve this goal. The attacker selects two groups of memory blocks from the Hugepage, whose bank indexes differ in only one bit. The attacker then allocates two threads to access the two groups simultaneously. If the different bank index bit is also a channel index bit, then the two groups will be in two different channels, and a shorter access time will be observed since there is no channel contention.


bank_bit{} // bank index bits
memory_buffer{} // a memory buffer
\Indm\KwOut \Indpchannel_bit{}
\Indm\Begin channel_bit{}=
\Foreach bit i bank_bit{} buffer_A{}=memory_buffer{}
\Foreach memory block d_a buffer_A{} m_a = physical address of d_a
\If(m_a’s bit i) 0 delete d_a from buffer_A{}
\Foreach bit j bank_bit{} and i j \If(m_a’s bit j) 0 delete d_a from buffer_A{}
\Foreach memory block d_b buffer_B{} m_b = physical address of d_b
\If(m_b’s bit i) 1 delete d_b from buffer_B{}
\Foreach bit j bank_bit{} and i j \If(m_b’s bit j) 0 delete d_b from buffer_B{}
thread_A: // access buffer_A in an infinite loop
\While(true) \Foreach memory block d_a buffer_A{} access d_a (uncached)
thread_B:// access buffer_B N times and measure time
\Fori=0 to N-1 \Foreach memory block d_b buffer_B{} access d_b (uncached)
total_time = thread_B’s execution time;
\Iftotal_timeThreshold add i to channel_bit{} \KwRetchannel_bit{} Discovering channel index bits

Then the attacker performs two stages: in the Discover Stage, the attacker keeps accessing each memory channel for a number of times and measures his own memory access time to infer contention from the victim program. By identifying the channels with a longer access time, the attacker can detect which channels are heavily used by the victim. Note that the attacker only needs to discover the channels used by the victim, but does not need to know the exact value of channel index bits for a given channel. In the Attack Stage, the attacker floods these selected memory channels. Algorithm 11 shows the two steps to conduct adaptive memory flooding attacks.


[t] \SetAlgoLined\KwIn \Indpmemory_channel{}: all the channels in the memory
\Indm\Begin /* Discover Stage */
\Foreach channel i in memory_channel{} access the addresses belonging to channel i from memory_buffer{}, and measure the total time (repeat for a number of times)
\Iftotal time is high add i to victim_channel{}
\BlankLine\BlankLine/* Attack Stage */
\Whileattack is not finished \Foreach channel i in victim_channel{} access the addresses belonging to channel i from memory_buffer{} Adaptive memory flooding

Figure 11 shows the results when the attacker VM uses 8 vCPUs to generate contention in selected memory channels which are heavily used by the victim. These adaptive memory flooding attacks cause slowdown while indiscriminately flooding the entire memory causes only slowdown.

4 Case Studies in Amazon EC2

We now evaluate our memory DoS attacks in a real cloud environment, Amazon EC2. We provide two case studies: memory DoS attacks against distributed applications, and against E-Commerce websites.

Legal and ethical considerations.  As our attacks only involve memory accesses within the attacker VM’s own address space, the experiments we conducted in this section conformed with EC2 customer agreement. Nevertheless, we put forth our best efforts in reducing the duration of the attacks to minimally impact other users in the cloud.

VM configurations.  We chose the same configuration for the attacker and victim VMs: t2.medium instances with 2 vCPUs, 4 memory and 8 disk. Each VM ran Ubuntu Server 14.04 LTS with Linux kernel version 3.13.0-48-generic, in full virtualization mode. All VMs were launched in the us-east-1c region. Information exposed through lscpu indicated that these VMs were running on 2.5 Intel Xeon E5-2670 processors, with a 32 L1D and L1I cache, a 256 L2 cache, and a shared 25 LLC.

For all the experiments in this section, the attacker employs exotic atomic locking (Sec. 3.3) and LLC cleansing attacks (Sec. 3.2), where each of the 2 attacker vCPUs was used to keep locking the memory and cleansing the LLC. Memory contention attacks (Section 3.4) are not used since they cause much lower performance degradation (availability loss) to the victim.

VM co-location in EC2.  The memory DoS attacks require the attacker and victim VMs to co-locate on the same machine. Past work [36, 39, 46] have proven the feasibility of such co-location attacks in public clouds. While cloud providers adopt new technologies (e.g., Virtual Private Cloud [3]) to mitigate prior attacks in [36], new ways are discovered to test and detect co-location in [39, 46]. Specifically, Varadarajan et al. [39] achieved co-location in Amazon EC2, Google Compute Engine and Microsoft Azure with low-cost (less than $8) in the order of minutes. They verified co-location with various VM configurations, launch delay between attacker and victim, launch time of day, datacenter location, etc.. Xu et al . [46] used similar ideas to achieve co-location in EC2 Virtual Private Cloud. We also applied these techniques to achieve co-location in Amazon EC2. In our experiments, we simultaneously launched a large number of attacker VMs in the same region as the victim VM. A machine outside EC2 under our control sent requests to static web pages hosted in the target victim VM. Each time we select one attacker VM to conduct memory DoS attacks and measure the victim VM’s response latency. Delayed HTTP responses from the victim VM indicates that this attacker was sharing the machine with the victim.

4.1 Attacking Distributed Applications

We evaluate memory DoS attacks on a multi-node distributed application deployed in a cluster of VMs, where each VM is deployed as one node. We show how much performance degradation an adversary can induce to the victim cluster with minimal cost, using a single co-located attacker VM.

Experiment settings.  We used Hadoop as the victim system. Hadoop consists of two layers: MapReduce for data processing, and Hadoop Distributed File System (HDFS) for data storage. A Hadoop cluster includes a single master node and multiple slave nodes. The master node acts as both the Job Tracker for scheduling map or reduce jobs and the NameNode for hosting HDFS indexes. Each slave node acts as both the Task Tracker for conducting the map or reduce operations and the DataNode for storing data blocks in HDFS. We deployed the Hadoop system with different numbers of VMs (5, 10, 15 or 20), where one VM was selected as the master node and the rest were the slave nodes.

The attacker only used one VM to attack the cluster. He either co-located the malicious VM with the master node or one of the slave nodes. We ran four different Hadoop benchmarks to test how much performance degradation the single attacker VM can cause to the Hadoop cluster. Each experiment was repeated 5 times. Figure 12 shows the mean values of normalized execution time and one standard deviation.

Figure 12: Performance slowdown of the Hadoop applications due to memory DoS attacks.

MRBench:  This benchmark tests the performance of the MapReduce layer of the Hadoop system: it runs a small MapReduce job of text processing for a number of times. We set the number of mappers and reducers as the number of slave nodes for each experiment. Figure a shows that attacking a slave node is more effective since the slave node is busy with the map and reduce tasks. In a large Hadoop cluster with 20 nodes, attacking just one slave node introduces slowdown to the entire distributed system.

TestDFSIO:  We use TestDFSIO to evaluate HDFS performance. This benchmark writes and reads files stored in HDFS. We configured it to operate on files with the size of 500, where is the number of slave nodes in the Hadoop cluster. Figure b shows that attacking the slave node is effective: the adversary can achieve about 2 slowdown.

NNBench:  This program is also used to benchmark HDFS in Hadoop. It generates HDFS-related management requests on the master node of HDFS. We configured it to operate on small files, where is the number of slave nodes in the Hadoop cluster. Since the master node is heavily used for serving the HDFS requests, attacking the master node can introduce up to 3.4 slowdown to the whole Hadoop system, as shown in Figure c.

Terasort:  We use this benchmark to test the overall performance of both MapReduce and HDFS layers in the Hadoop cluster. TeraSort generates a large set of data and uses map/reduce operations to sort the data. For each experiment, we set the number of mappers and reducers to , and the size of data to be sorted to , where is the number of slave nodes in the Hadoop cluster. Figure d shows that attacking the slave node is very effective: it can bring slowdown to the entire Hadoop system.

Summary.  The adversary can deny working memory availability to the victim VM and thus degrade an important distributed system’s performance with minimal costs: it can use just one VM to interfere with one of 20 nodes in the large cluster. The slowdown of a single victim node can cause up to 3.7 slowdown to the whole system.

4.2 Attacking E-Commerce Websites

A web application consists of load balancers, web servers, database servers and memory caching servers. Memory DoS attacks can disturb an E-commerce web application by attacking various components.

Experiment settings.  We chose a popular open source E-commerce web application, Magento [7], as the target of the attack. The victim application consists of five VMs: a load balancer based on Pound for balancing network requests; two Apache web servers to process and deliver web requests; a MySQL database server to store customer and merchandise information; and a Memcached server to speed up database transactions. The five VMs were hosted on different cloud servers in EC2. The adversary is able to co-locate his VMs with one or multiple VMs that host the victim application. We measure the application’s latency and throughput to evaluate the effectiveness of the attack.

Latency.  We launched a client on a local machine outside of EC2. The client employed httperf [12] to send HTTP requests to the load balancer with different rates (connections per second) and we measured the average response time. We evaluated the attack from one or all co-located VMs. Each experiment was repeated 10 times and the mean and standard deviation of the latency are reported in Figure a. This shows that memory contention on database, load balancer or memcached servers do not have much impact on the overall performance of the web application, with only up to 2 degradation. This is probably because these servers were not heavily used in these cases. Memory DoS attacks on web servers were the most effective (17 degradation). When the adversary can co-locate with all victim servers and each attacker VM induces contention with the victim, the web server’s HTTP response time was delayed by , for a request rate of 50 connections per second.

Server throughput.  Figure b shows the results of another experiment, where we measured the throughput of each victim VM individually, under memory DoS attacks. We used ApacheBench [1] to evaluate the load balancer and web servers, SysBench [11] to evaluate the database server and memtier_benchmark [8] to evaluate the memcached server. This shows memory DoS attacks on these servers were effective: the throughput can be reduced to only 13% 70% under malicious contention by the attacker.

Figure 13: Latency and throughput of the Magento application due to memory DoS attacks.

Summary.  The adversary can compromise the quality of E-commerce service and cause financial loss in two ways: (1) long response latency will affect customers’ satisfaction and make them leave this E-commerce website [35]; (2) it can cause throughput degradation, reducing the number of transactions completed in a unit time. The cost for these attacks is relatively cheap: the adversary only needs a few VMs to perform the attacks, with each t2.medium instance costing $0.052 per hour.

5 Defense against Memory DoS Attacks

We propose a novel, general-purpose approach to detecting and mitigating memory DoS attacks in the cloud. Unlike some past work, our defense does not require prior profiling of the memory resource usage of the applications. Our defense can be provided by the cloud providers as a new security service to customers. We denote as Protected VMs those VMs for which the cloud customers require protection. To detect memory DoS attacks, lightweight statistical tests are performed frequently to monitor performance changes of the Protected VMs (Sec. 5.1). To mitigate the attacks, execution throttling is used to reduce the impact of the attacks (Sec. 5.2). A novelty of our approach is the combined use of two existing hardware features: event counting using hardware performance counters controllable via the Performance Monitoring Unit (PMU) and duty cycle modulation controllable through the IA32_CLOCK_MODULATION Model Specific Register (MSR).

5.1 Detection Method

The key insight in detecting memory DoS attacks is that such attacks are caused by abnormal resource contention between Protected VMs and attacker VMs, and such resource contention can significantly alter the memory usage of the Protected VM, which can be observed by the cloud provider. We postulate that the statistics of accesses to memory resources, by a phase of a software program, follow certain probability distributions. When a memory DoS attack happens, these probability distributions will change. Figure 14 shows the probability distributions of the Protected VM’s memory access statistics, without attacks (black), and with two kinds of attacks (gray and shaded), when it runs one of four applications introduced in Sec. 4.2, i.e., the Apache web server, Mysql database, Memcached and Pound load-balancer. When an attacker is present, the probability distribution of the Protected VM’s memory access statistics (in this case, memory bandwidth in GigaBytes per second) changes significantly.

Figure 14: Probability distributions of the Protected VM’s memory bandwidth.

In practice, only samples drawn from the underlying probability distribution are observable. Therefore, the provider’s task is to collect two sets of samples: [, , …, ] are reference samples collected from the probability distribution when we are sure that there are no attacks; [, , …, ] are monitored samples collected from the Protected VM at runtime, when attacks may occur. If these two sets of samples are not drawn from the same distribution, we can conclude that the performance of the Protected VM is hindered by its neighboring VMs. When the distance between the two distributions is large, we may conclude the Protected VM is under some memory DoS attacks.

We propose to use the two-sample Kolmogorov-Smirnov (KS) tests [30], as a metric for whether two samples belong to the same probability distribution. The KS statistic is defined in Equation 1, where is the empirical distribution function of the samples , and is the supremum function (i.e., returning the maximum value). Superscripts M and R denote the monitored samples and reference samples, respectively. and are the number of monitored samples and reference samples.


Null hypothesis for KS test.  We establish the null hypothesis that currently monitored samples are drawn from the same distribution as the reference samples. Benign performance contention with non-attacking, co-tenant VMs will not alter the probability distribution of the Protected VM’s monitored samples significantly, so the KS statistic is small and the null hypothesis is held. Equation 2 introduces : We can reject the null hypothesis with confidence level if the KS statistic, , is greater than predetermined critical values . Then, the cloud provider can assume, with confidence level , that a memory DoS attack exists, and trigger a mitigation strategy.

While monitored samples, , are simply collected at runtime, reference samples, , ideally should be collected when the Protected VM is not affected by other co-located VMs. The technical challenge here is that if these samples are collected offline, we need to assume the memory access statistics of the VM never change during its life time, which is unrealistic. If samples are collected at runtime, all the co-locating VMs need to be paused during sample collection, which, if performed frequently, can cause significant performance overhead to benign, co-located VMs.

Figure 15: Illustration of monitoring the Protected VM (a) and identifying the attack VM (b). The blue “✴” means the null hypothesis is accepted; while the red “✸” means the null hypothesis is rejected.

Pseudo Isolated Reference Sampling.  To address this technical challenge, we use execution throttling to collect the reference samples at runtime. The basic idea is to throttle down the execution speed of other VMs, but maintain the Protected VM’s speed during the reference sampling stage. This can reduce the co-located VMs’ interference without pausing them.

Execution throttling is based on a feature provided in Intel Processors called duty cycle modulation [6], which is designed to regulate each core’s execution speed and power consumption. The processor allows software to assign “duty cycles” to each CPU core: the core will be active during these duty cycles, and inactive during the non-duty cycles. For example, the duty cycle of a core can be set from 16/16 (no throttling), 15/16, 14/16, …, down to 1/16 (maximum throttling). Each core uses a model specific register (MSR), IA32_CLOCK_MODULATION, to control the duty cycle ratio: bit 4 of this MSR denotes if the duty cycle modulation is enabled for this core; bits 0-3 represent the number of 1/16 of the total CPU cycles set as duty cycles.

In execution throttling, the execution speed of other VMs will be throttled down and very little contention is induced to the Protected VM. As such, reference samples collected during the execution throttling stage are drawn from a quasi contention-free distribution.

Figure a illustrates the high-level strategy for monitoring Protected VMs. The reference samples are collected during the reference sampling periods (), where other VMs’ execution speeds are throttled down. The monitored samples are collected during the monitored sampling periods (), where co-located VMs run normally, without execution throttling. KS tests are performed right after each monitored sample is collected, and probability distribution divergence is estimated by comparing with the most recent reference samples. Monitored samples are collected periodically at a time interval of , and reference samples are collected periodically at a time interval of . We can also randomize the intervals and for each period to prevent the attacker from reverse-engineering the detection scheme and scheduling the attack phases to avoid detection.

If the KS test results reject the null hypothesis, it may be because the Protected VM is in a different execution phase with different memory access statistics, or it may be due to memory DoS attacks. To rule out the first possibility, double checking automatically occurs since reference samples are re-collected and updated after a time interval of . If deviation of the probability distribution still exists, attacks can be confirmed.

5.2 Mitigation Method

The cloud provider has several methods to mitigate the attack. One is VM migration, which can be achieved either by reassigning the vCPUs of a VM to a different CPU package, when the memory resource being contended is in the same package (e.g., LLC), or by migrating the entire VM to another server, when the memory resource contended is shared system-wide (e.g., memory bus). However, such VM migration can not completely eliminate the attacker VM’s impact on other VMs.

An alternative approach is to identify the attacker VM, and then employ execution throttling to reduce the execution speed of the malicious VM, while meanwhile the cloud provider conducts further investigation and/or notifies the customer of the suspected attacker VM of observed resource abuse activities.

Identifying the attacker VM.  Once memory DoS attacks are detected, to mitigate the threat, the cloud provider needs to identify which of the co-located VMs is conducting the attack. Here we propose a novel approach to identify malicious VMs based on selective execution throttling in a binary search manner: First, half of the co-located VMs keep normal execution speed while the rest of VMs are throttled down during reference sampling periods (Figure b, 2nd Reference Sampling period). If in this case, reference samples and monitored samples are drawn from the same distribution, then there are malicious VMs among the ones not throttled down during the reference sampling period. Then, we select half of the remaining VMs to be throttled while all the other VMs are in normal speed, to collect the next reference samples. In Figure b, this is the 3rd Reference Sampling period, where only VM3 is throttled. Since the subsequent monitored samples have a different distribution compared to this Reference Sample, VM3 is identified as the attack VM. Note that if there are multiple attacker VMs on the server, we can use the above procedure to find one VM each time and repeat it until all the attacker VMs are found. By organizing this search for the attacker VM or VMs as a binary search, the time taken to identify the source of memory contention is O (log ), where is the number of co-tenant VMs on the Protected VM’s server. Algorithm 5.2 shows how to detect attacker VMs using this selective execution throttling.


[ht] \SetAlgoLined


VM[1,…,n] /* set of co-tenant VMs */
\BlankLine\BlankLine\Indm\SetKwFunctionprocIdentifyAttacker \SetKwProgmyalgfunctionend


sub_VM /* sub_VM: set of VMs to identify */
\eIfsub_VM.length() = 1 \KwRetsub_VM[0] imin = 0
imax = sub_VM.length()-1
imid = (imin+imax)/2
reference_sample = DataCollect()
monitor_sample = DataCollect()
result = KSTest(reference_sample, monitor_sample)
\eIfresult = Reject \KwRet\procsub_VM[0,…,imid-1] \KwRet\procsub_VM[imid,…,imax] \BlankLine\BlankLine


vm = \procVM

Identifying and mitigating the attacker VMs that cause severe resource contention.

5.3 Implementation

We implement a prototype system of our proposed defense on the OpenStack platform. Figure 16 shows the defense architecture overview. We adopt the CloudMonatt architecture from [48]. Specifically, the system includes three types of servers. The Cloud Controller is the cloud manager that manages the VMs. It has a Policy Validation Module to receive and analyze customers’ requests. It also has a Response Module, which can throttle down the attacker VMs’ execution speed to mitigate memory DoS attacks. The Attestation Server is a centralized server for monitoring and detection of memory DoS attacks. It has a Verification Module that receives Protected VM’s performance probability distribution, detects memory DoS attacks and identifies malicious VMs.

Figure 16: Architecture overview.

On each of the cloud servers, we use the KVM hypervisor which is the default setup for OpenStack. Other virtualization platforms, such as Xen and HyperV, can also be used. Two software modules are installed on the host OS. A Detector measures the memory access characteristics of the Protected VM using Performance Monitoring Units (PMU), which are commonly available in most modern processors. A PMU provides a set of Hardware Performance Counters to count hardware-related events. In our implementation, we use the linux kernel API perf_event to measure the memory access statistics for the number of LLC accesses per sampling period. A Regulator is in charge of controlling VMs’ execution speed. It uses the wrmsr instruction to modify the IA32_CLOCK_MODULATION MSR to control the duty cycle ratio.

In our implementation, the parameters involved in reference and monitored sampling are as follows: , , . These values were selected to strike a balance between the performance overhead due to execution throttling and detection accuracy. In each sampling period, samples are collected, with each collected during a period of 10ms. We choose 10ms because it is short enough to provide accurate measurements, and long enough to return stable results. In the KS tests, the confidence level, , is set as 0.999, and the threshold to reject the null hypothesis is (given ). If 4 consecutive KS statistics larger than 0.276 are observed (the choice of 4 is elaborated in Sec. 5.4), it is assured that the Protected VM’s memory access statistics have been changed. Then to confirm that such changes are due to memory DoS attacks, reference samples will be refreshed and the malicious VM will be identified.

5.4 Evaluation

Our lab testbed comprised three servers. A Dell R210II Server (equipped with one quad-core, 3.30, Intel Xeon E3-1230v2 processor with 8 LLC) was configured as the Cloud Controller as well as the Attestation Server. Two Dell PowerEdge R720 Servers (one has two six-core, 2.90 Intel Xeon E5-2667 processors with 15 LLC, the other has one eight-core, 2.90 Intel Xeon E5-2690 processor with 20 LLC) were deployed to function as VM hosting servers.

Detection accuracy.  We deployed a Protected VM sharing a cloud server with 8 other VMs. Among these 8 VMs, one VM was an attacker VM conducting a multi-threaded LLC cleansing attack with 4 threads (Sec. 3.2), or an atomic locking attack (Sec. 3.3). The remaining 7 VMs were benign VMs running common linux utilities. The Protected VM runs one of the web, database, memcached or load-balancer applications in the Magento application (Sec. 4.2). The experiments consisted of four stages; the KS statistics of each of the four workloads during the four stages under the two types of attacks are shown in Figure 17.

Figure 17: KS statistics of the Protected VM for detecting and mitigating memory DoS attacks.

In stage I, the Protected VM runs while the attacker is idle. The KS statistic in this stage is relatively low. So we accept the null hypothesis that the memory accesses of the reference and monitored samples follow the same probability distribution. In stage II, the attacker VM conducts the LLC cleansing or atomic locking attacks. We observe the KS statistic is much higher than 0.276. The null hypothesis is rejected, signaling detection of potential memory DoS attacks. In stage III, the cloud provider runs three rounds of reference resampling to pinpoint the malicious VM. Resource contention mitigation is performed in stage IV: the cloud provider throttles down the attacker VM’s execution speed. After this stage, the KS statistic falls back to normal which suggests that the attacks are mitigated.

We also evaluated the false positive rates and false negative rates of two different criteria for identifying a memory access anomaly: 1 abnormal KS statistic (larger than the critical value ) or 4 consecutive abnormal KS statistics. Figure a shows the true positive rate of LLC cleansing and atomic locking attack detection, at different confidence levels . We observe that the true positive rate is always one (thus zero false negatives), regardless of the detection criteria (1 vs 4 abnormal KS tests). Figure b shows the false positive rate, which can be caused by background noise due to other VMs’ executions. This figure shows that using 4 consecutive abnormal KS statistics significantly reduces the false positive rate.

Figure 18: Detection accuracy.
Figure 19: Normalized performance of the Protected VM with throttling of memory DoS attacks.

Effectiveness of mitigation.  We evaluated the effectiveness of execution throttling based mitigation. The Protected VM runs the cloud benchmarks from the Magento application while the attacker VM runs LLC cleansing or atomic locking attacks. We chose different duty cycle ratios for the attacker VM. Figures a and b show the normalized performance of the Protected VM with different throttling ratios, under LLC cleansing and atomic locking attacks, respectively. The x-axis shows the duty cycle (x 1/16) given to the co-located VMs, going from no throttling on the left to maximum throttling on the right of each figure. The y-axis shows the Protected VM’s response latency (for web and load-balancer) or throughput (for memcached and database) normalized to the ones without attack. A high latency or a small throughput indicates that the performance of the Protected VM is highly taffected by the attacker VM. We can see that a smaller throttling ratio can effectively reduce the attacker’s impact on the victim’s performance. When the ratio is set as 1/16, the victim’s performance degradation caused by the attacker is kept within 12% (compared to degradation with no throttling) for LLC cleansing attacks. It is within 14% for atomic locking attacks (compared to degradation with no throttling).

Latency increase and mitigation.  We chose a latency-critical application, the Magento E-commerce application as the target victim. One Apache web server was selected as the Protected VM, co-locating with an attacker and 7 benign VMs running linux utilities. Figure 20 shows the response latency with and without our defense. The detection phase does not affect the Protected VM’s performance (stage I), since the PMU collects monitored samples without interrupting the VM’s execution. In stage II, the attack occurs and the defense system detects the Protected VM’s performance is degraded. In stage III, attacker VM identification is done. After throttling down the attacker VM in stage IV, the Protected VM’s performance is not affected by the memory DoS attacks. The latency during the attack in Phase II increases significantly, but returns to normal after mitigation in Phase IV.

Figure 20: Request latency of Magento Application

We also evaluated the performance overhead of co-located VMs due to execution throttling in the detection step. We launched one VM running one of the eight SPEC2006 or PARSEC benchmarks. Then we periodically throttle down this VM every 10s, 20s or 30s. Each time throttling lasted for 1s (the same value for and used earlier). The normalized performance of this VM is shown in Figure 21. We can see that when the server throttles this VM every 10s, the performance penalty can be around 10%. However, when the frequency is set to be 30s (our implementation choice), this penalty is smaller than 5%.

Figure 21: Performance overhead of co-located VMs due to monitoring.

6 Related Work

6.1 Resource Contention Attacks

Cloud DoS attacks.[28] proposed a DoS attack which can deplete the victim’s network bandwidth from its subnet. [15] proposed a network-initiated DoS attack which causes contention in the shared Network Interface Controller. [23] proposed cascading performance attacks which exhaust hypervisor’s I/O processing capability. [14] exploited VM migration to degrade the hypervisor’s performance. Our work is different as we exploit failure of isolation in the hardware memory subsystem (which has not been addressed by cloud providers), and not attacks on networks or hypervisors.

Cloud resource stealing attacks.[38] proposed the resource-freeing attack, where a malicious VM can steal one type of resource from the co-located victim VM by increasing this VM’s usage of other types of resources. [54] designed a CPU resource attack where an attacker VM can exploit the boost mechanism in the Xen credit scheduler to obtain more CPU resource than paid for. Our attacks do not steal extra cloud resources. Rather, we aim to induce the maximum performance degradation to the co-located victim VM targets.

Hardware resource contention studies.[21] studied the effect of trace cache evictions on the victim’s execution with Hyper-Threading enabled in an Intel Pentium 4 Xeon processor. [43] explored frequently flushing shared L2 caches on multicore platforms to slow down a victim program. They studied saturation and locking of buses that connect L1/L2 caches and the main memory [43]. [32] studied contention attacks on the schedulers of memory controllers. However, due to advances in computer hardware design, caches and DRAMs are larger and their management policies more sophisticated, so these prior attacks may not work in modern cloud settings.

Timing channels in clouds.  Prior studies showed that shared memory resources can be exploited by an attacker to extract crypto keys from the victim VM using cache side-channel attacks in cloud settings [51, 52, 27], or to transmit information, using cache operations [36, 45] or bus activities [44] in covert channel communications between two VMs. Unlike side-channel attacks our memory DoS attacks aim to maximize the effects of resource contention, while resource contention is an unintended side-effect of side-channel attacks. To maximize contention, we addressed various new challenges, e.g., finding which attacks cause greatest resource contention (exotic bus locking versus memory controller attacks), maximizing the frequency of resource depletion, and minimizing self-contention. To the best of our knowledge, we are the first to show that similar attack strategies (enhanced for resource contention) can be used as availability attacks as well as confidentiality attacks.

6.2 Eliminating Resource Contention

VM performance monitoring.  Public clouds offer performance monitoring services for customers’ VMs and applications, e.g., Amazon CloudWatch [2], Microsoft Azure Application Insights [9], Google Stackdriver [4], etc.. However, these services only monitor CPU usage, network traffic and disk bandwidth, but not low-level memory usage. To measure a VM’s performance without contention for reference sampling, past work offer three ways: (1) collecting the VM’s performance characteristics before it is deployed in the cloud [19, 53]; (2) measuring the performance of other VMs which run similar tasks [50, 34]; (3) measuring the Protected VM while pausing all other co-located VMs [22, 47]. The drawback of (1) and (2) is that it only works for programs with predictable and stable performance characteristics, and does not support arbitrary programs running in the Protected VM. The problem with (3) is the significant performance overhead inflicted on co-located VMs. In contrast, we use novel execution throttling of the co-located VMs to collect the Protected VM’s baseline (reference) measurements with negligible performance overhead. While execution throttling has been used to achieve resource fairness in prior work [49, 20]; using it to collect Reference samples at runtime is, to our knowledge, novel.

QoS-aware VM scheduling.  Prior research propose to predict interference between different applications (or VMs) by profiling their resource usage offline and then statically scheduling them to different servers if co-locating them will lead to excessive resource contention [19, 53, 47]. The underlying assumption is that applications (or VMs), when deployed on the cloud servers, will not change their resource usage patterns. Unfortunately, these approaches fall short in defense against malicious applications, who can reduce their resource uses during the profiling stage, then run memory DoS attacks when deployed, thus evading these QoS scheduling mechanisms.

Load-triggered VM migration.  Some studies propose to monitor the resource consumption of guest VMs or the entire server in real-time, and migrate VMs to different processor packages or servers when there is severe resource contention [17, 55, 13, 40]. By doing so these approaches can dynamically balance the workload among multiple packages or servers when some of them are overloaded, and achieve an optimal resource allocation. While they work well for performance optimization of a set of fully-loaded servers, they fail to detect carefully-crafted memory DoS attacks. First, the metrics in their methods cannot be used to detect the existence of memory DoS attacks. These works measure the LLC miss rate [17, 55] or memory bandwidth [13, 40] of guest VMs or the whole server. A high LLC miss rate or memory bandwidth indicates severe resource contention for the VMs or servers. However, a memory DoS attack does not need to cause high LLC miss rate or memory bandwidth in order to degrade a victim’s performance. For instance, atomic locking attacks lock the bus temporarily but frequently, which could incur decreased LLC accesses and LLC misses in the victim VM. Our experiments show the victim VM’s LLC miss rate does not change, and its memory bandwidth is even decreased. So such micro-architectural measurement can never reveal severe resource contention, or trigger VM migration in the above approaches. Second, these approaches aim to balance the system’s performance. So they cannot guarantee to choose the victim or attacker VMs for migration when they achieve optimal workload placement. For instance, Adaptive LLC cleaning attacks can increase victim VMs’ LLC miss rate. However, the above approaches have no means to figure out that the victim VM has the strongest desire for migration. It is possible that there exists another VM, which has even higher LLC miss rate than the victim VM, due to its internal execution behaviors, not the interaction and contention with the attacker. So the above approaches will migrate this VM instead of the victim VM, as this VM has the highest miss rate. Then the victim VM will still suffer LLC cleaning attacks.

Performance isolation.  While cloud providers can offer single-tenant machines to customers with high demand for security and performance, disallowing resource sharing by VMs will lead to low resource utilization and thus is at odds with the cloud business model. Another option is to partition memory resources to enforce performance isolation on shared resources (e.g., LLC [42, 37, 26, 18, 25, 5], or DRAM [32, 33, 41]). These works aim to achieve fairness between different domains and provide fair QoS. However, they cannot effectively defeat memory DoS attacks. For cache partitioning, software page coloring methods [26] can cause significant wastage of LLC space, while hardware cache partitioning mechanisms have insufficient partitions (e.g., Intel Cache Allocation Technology [5] only provides four QoS partitions on the LLC). Furthermore, LLC cache partitioning methods cannot resolve atomic locking attacks.

To summarize, existing solutions fail to address memory DoS attacks because they assume benign applications with non-malicious behaviors. Also, they are often tailored to only one type of attack so that they cannot be generalized to all memory DoS attacks, unlike our proposed defense.

7 Conclusions

We presented memory DoS attacks, in which a malicious VM intentionally induces memory resource contention to degrade the performance of co-located victim VMs. We proposed several advanced techniques to conduct such attacks, and demonstrate the severity of the resulting performance degradation. Our attacks work on modern memory systems in cloud servers, for which prior attacks on older memory systems are often ineffective. We evaluated our attacks against two commonly used applications in a public cloud, Amazon EC2, and show that the adversary can cause significant performance degradation to not only co-located VMs, but to the entire distributed application.

We then designed a novel and generalizable method that can detect and mitigate all known memory DoS attacks. Our approach collects the Protected VM’s reference and monitored behaviors at runtime using the Performance Monitor Unit. This is done by establishing a pseudo isolated collection environment by using the duty-cycle modulation feature to throttle the co-resident VMs for collecting Reference samples. Statistical tests are performed to detect differing performance probability distributions between Reference and Monitored samples, with desired confidence levels. Our evaluation shows this defense can detect and defeat memory DoS attacks with very low performance overhead.


  1. Ab - the apache software foundation.
  2. Amazon CloudWatch.
  3. Amazon virtual private cloud.
  4. Google Stackdriver.
  5. Improving real-time performance by utilizing cache allocation technology.
  6. Intel 64 and ia-32 architectures software developer’s manual, volume 3: System programming guide.
  7. Magento: ecommerce software and ecommerce platform.
  8. memtier benchmark.
  9. Microsoft Azure Application Insights.
  10. Spec cpu 2006.
  11. Sysbench: a system performance benchmark.
  12. Welcome to the httperf homepage.
  13. J. Ahn, C. Kim, J. Han, Y.-R. Choi, and J. Huh. Dynamic virtual machine scheduling in clouds for architectural shared resources. In USENIX Conference on Hot Topics in Cloud Computing, 2012.
  14. S. Alarifi and S. D. Wolthusen. Robust coordination of cloud-internal denial of service attacks. In Intl. Conf. on Cloud and Green Computing, 2013.
  15. H. S. Bedi and S. Shiva. Securing cloud infrastructure against co-resident DoS attacks using game theoretic defense mechanisms. In Intl. Conf. on Advances in Computing, Communications and Informatics, 2012.
  16. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, 2011.
  17. S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali. A case for numa-aware contention management on multicore systems. In ACM Intl. Conf. on Parallel architectures and compilation techniques.
  18. H. Cook, M. Moreto, S. Bird, K. Dao, D. A. Patterson, and K. Asanovic. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. In Intl. Symp. on Computer Architecture, 2013.
  19. C. Delimitrou and C. Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2013.
  20. E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via source throttling: A configurable and high-performance fairness substrate for multi-core memory systems. In Architectural Support for Programming Languages and Operating Systems, 2010.
  21. D. Grunwald and S. Ghiasi. Microarchitectural denial of service: Insuring microarchitectural fairness. In ACM/IEEE Intl. Symp. on Microarchitecture, 2002.
  22. A. Gupta, J. Sampson, and M. B. Taylor. Quality time: A simple online technique for quantifying multicore execution efficiency. In IEEE Intl. Symp. on Performance Analysis of Systems and Software, 2014.
  23. Q. Huang and P. P. Lee. An experimental study of cascading performance interference in a virtualized environment. SIGMETRICS Perform. Eval. Rev., 2013.
  24. P. Jamkhedkar, J. Szefer, D. Perez-Botero, T. Zhang, G. Triolo, and R. B. Lee. A framework for realizing security on demand in cloud computing. In Conf. on Cloud Computing Technology and Science, 2013.
  25. D. Kim, H. Kim, and J. Huh. vcache: Providing a transparent view of the llc in virtualized environments. Computer Architecture Letters, 2014.
  26. T. Kim, M. Peinado, and G. Mainar-Ruiz. Stealthmem: System-level protection against cache-based side channel attacks in the cloud. In USENIX Security Symp., 2012.
  27. F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee. Last-level cache side-channel attacks are practical. In IEEE Symp. on Security and Privacy, 2015.
  28. H. Liu. A new form of DoS attack in a cloud and its avoidance mechanism. In ACM Workshop on Cloud Computing Security, 2010.
  29. L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu. A software memory partition approach for eliminating bank-level interference in multicore systems. In Intl. Conf. on Parallel Architectures and Compilation Techniques, 2012.
  30. F. J. Massey Jr. The kolmogorov-smirnov test for goodness of fit. Journal of the American statistical Association, 1951.
  31. J. D. McCalpin. Stream: Sustainable memory bandwidth in high performance computers.
  32. T. Moscibroda and O. Mutlu. Memory performance attacks: Denial of memory service in multi-core systems. In USENIX Security Symp., 2007.
  33. S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda. Reducing memory interference in multicore systems via application-aware memory channel partitioning. In ACM/IEEE Intl. Symp. on Microarchitecture, 2011.
  34. D. Novaković, N. Vasić, S. Novaković, D. Kostić, and R. Bianchini. Deepdive: Transparently identifying and managing performance interference in virtualized environments. In USENIX Conf. on Annual Technical Conference, 2013.
  35. N. Poggi, D. Carrera, R. Gavalda, and E. Ayguade. Non-intrusive estimation of qos degradation impact on e-commerce user satisfaction. In IEEE Intl. Symp. on Network Computing and Applications, 2011.
  36. T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: Exploring information leakage in third-party compute clouds. In ACM Conf. on Computer and Communications Security, 2009.
  37. D. Sanchez and C. Kozyrakis. Vantage: Scalable and efficient fine-grain cache partitioning. In AMC Intl. Symp. on Computer Architecture, 2011.
  38. V. Varadarajan, T. Kooburat, B. Farley, T. Ristenpart, and M. M. Swift. Resource-freeing attacks: Improve your cloud performance (at your neighbor’s expense). In ACM Conf. on Computer and Communications Security, 2012.
  39. V. Varadarajan, Y. Zhang, T. Ristenpart, and M. Swift. A placement vulnerability study in multi-tenant public clouds. In USENIX Security Symp., 2015.
  40. H. Wang, C. Isci, L. Subramanian, J. Choi, D. Qian, and O. Mutlu. A-drm: Architecture-aware distributed resource management of virtualized clusters. In ACM Intl. Conference on Virtual Execution Environments, 2015.
  41. Y. Wang, A. Ferraiuolo, and G. E. Suh. Timing channel protection for a shared memory controller. In IEEE Intl. Symp. on High Performance Computer Architecture, 2014.
  42. Z. Wang and R. B. Lee. New cache designs for thwarting software cache-based side channel attacks. In ACM Intl. Symp. on Computer Architecture, 2007.
  43. D. H. Woo and H.-H. S. Lee. Analyzing performance vulnerability due to resource denial-of-service attack on chip multiprocessors. In Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
  44. Z. Wu, Z. Xu, and H. Wang. Whispers in the hyper-space: High-speed covert channel attacks in the cloud. In USENIX Security Symp., 2012.
  45. Y. Xu, M. Bailey, F. Jahanian, K. Joshi, M. Hiltunen, and R. Schlichting. An exploration of L2 cache covert channels in virtualized environments. In ACM Workshop on Cloud computing security, 2011.
  46. Z. Xu, H. Wang, and Z. Wu. A measurement study on co-residence threat inside the cloud. In USENIX Security Symp., 2015.
  47. H. Yang, A. Breslow, J. Mars, and L. Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In ACM Intl. Symp. on Computer Architecture, 2013.
  48. T. Zhang and R. B. Lee. Cloudmonatt: An architecture for security health monitoring and attestation of virtual machines in cloud computing. In ACM Intl. Symp. on Computer Architecture, 2015.
  49. X. Zhang, S. Dwarkadas, and K. Shen. Hardware execution throttling for multi-core resource management. In USENIX Annual Technical Conference, 2009.
  50. X. Zhang, E. Tune, R. Hagmann, R. Jnagal, V. Gokhale, and J. Wilkes. Cpi2: Cpu performance isolation for shared compute clusters. In ACM European Conf. on Computer Systems, 2013.
  51. Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Cross-VM side channels and their use to extract private keys. In ACM Conf. on Computer and Communications Security, 2012.
  52. Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Cross-tenant side-channel attacks in PaaS clouds. In ACM Conf. on Computer and Communications Security, 2014.
  53. Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang. Smite: Precise qos prediction on real-system smt processors to improve utilization in warehouse scale computers. In IEEE/ACM Intl. Symp. on Microarchitecture, 2014.
  54. F. Zhou, M. Goel, P. Desnoyers, and R. Sundaram. Scheduler vulnerabilities and coordinated attacks in cloud computing. In IEEE Intl. Symp. on Network Computing and Applications, 2011.
  55. S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2010.
This is a comment super asjknd jkasnjk adsnkj
The feedback cannot be empty
Comments 0
The feedback cannot be empty
Add comment

You’re adding your first comment!
How to quickly get a good reply:
  • Offer a constructive comment on the author work.
  • Add helpful links to code implementation or project page.