Get Your Workload in Order:Game Theoretic Prioritization of Database Auditing

Get Your Workload in Order: Game Theoretic Prioritization of Database Auditing


The quantity of personal data that is collected, stored, and subsequently processed continues to grow at a rapid pace. Given its potential sensitivity, ensuring privacy protections has become a necessary component of database management. To enhance protection, a number of mechanisms have been developed, such as audit logging and alert triggers, which notify administrators about suspicious activities that may require investigation. However, this approach to auditing is limited in several respects. First, the volume of such alerts grows with the size of the database and is often substantially greater than the capabilities of resource-constrained organizations. Second, strategic attackers can attempt to disguise their actions or carefully choosing which records they touch, such as by limiting the number of database accesses they commit, thus potentially hiding illicit activity in plain sight. In this paper, we introduce a novel approach to database auditing that explicitly accounts for adversarial behavior by 1) prioritizing the order in which types of alerts are investigated and 2) providing an upper bound on how much resource to allocate for each type.

Specifically, we model the interaction between a database auditor and potential attackers as a Stackelberg game in which the auditor chooses a (possibly randomized) auditing policy and attackers choose which, if any, records to target. Upon doing so, we show that even a highly constrained version of the auditing problem is NP-Hard. Based on this finding, we introduce an approach that combines linear programming, column generation, and heuristic search to derive an auditing policy. On the synthetic data, we perform an extensive evaluation on both the approximation degree of our solution with the optimal one and the computational magnitude of our approach. The two real datasets, 1) 1.5 months of audit logs from the electronic medical record system of Vanderbilt University Medical Center and 2) a publicly available credit card application dataset of 1000 records, are used to test the policy-searching performance. The findings illustrate that our methods produce high-quality mixed strategies as database audit policies, and our general approach significantly outperforms non-game-theoretic baselines.

I Introduction

Modern computing and storage technology has made it possible to create ad hoc database systems with the ability to collect, store, and process extremely detailed information about the daily activities of individuals [mcafee2012]. These database systems hold great value for society, but accordingly face challenges to security and, ultimately, personal privacy. The sensitive nature of the data stored in such systems attracts malicious attackers who can gain value by disrupting them in various ways (e.g., stealing sensitive information, commandeering computational resources, committing financial fraud, and simply shutting the system down) [Ablon:2014:MCT:2685860]. It is evident that the severity and frequency of attack events continues to grow. Notably, the most recent breach at Equifax led to the exposure of data on 143 million Americans, including credit card numbers, Social Securiy numbers, and other information that could be used for identity theft or other illicit purposes [cnn2017equifax]. Even more of a concern is that the exploit of the system continued for at least two months before it was discovered.

While complex access control systems have been developed for database management, it has been recognized that in practice no database systems will be impervious to attack [Ten10]. As such, prospective technical protections need to be complemented by retrospective auditing mechanisms, a notion that has been well recognized by the database community [kuna2014outlier]. Though audits do not directly prevent attacks in their own right, they may allow for the discovery of breaches that can be followed up on before they escalate to full blown exploits by adversaries originating from beyond, as well as within, an organization.

In the general situation of database management, auditing relies heavily on the performance of a threat detection and misuse tracking (TDMT) module, which raises real-time alerts based on the actions committed to a system for further investigation by experts. In general, the alert types are speciafically predefined by the administrator officials in ad hoc applications. For instance, in the healthcare domain, organizations are increasingly reliant on electronic medical record (EMR) systems for anytime, anywhere access to a patient’s health status [Hsiao2013]. Given the complex and dynamic nature of healthcare, these organizations often grant employees broad access privileges, which increases the potential risk that inside employees illegally exploite the EMR of patients [Gunter11]. To detect when a specific access to a patient’s medical record is a potential policy violations, healthcare organizations use various triggers to generate alerts, which can be based on predefined rules (e.g., when an access is made to a designated very important person). As a consequence, the detected anomalies, which indicate deviations from routine behavior (e.g., when a pediatrician accesses the records of elderly individuals), can be checked by privacy officials [Agrawal:07].

Although TDMTs are widely deployed in database systems as both detection and deterrence tools, security and privacy have not been sufficiently guaranteed. The utility of TDMT in practice is currently limited by the fact that they often lead to a very large number of alerts, whereas the number of actual violations tends to be quite small. This is particularly problematic because the large quantities of false alarms can easily overwhelm the capacity of the administrative officials who are expected to follow-up on these, but have limited resources at their disposal [Rostad06]. One typical example is the observation from our evaluation dataset: at Vanderbilt University Medical Center, on any workday, the volume of accesses to the EMR system is around 1.8 million, of which more than 30,000 alerts of varying predefined types are generated, which far beyonds the capacity of privacy officials. Therefore, in lieu of an efficient audit functionality in the database systems, TDMTs are not optimized for detecting suspicious behavior.

Given the overwhelming number of alerts in comparison to available auditing resource and the need to catch attackers, the core query function invoked by an administrator must consider resource constraints. And, given such constraints, we must determine which triggered alerts should be recommended for investigation. One intuitive way to proceed is to prioritize alert categories based on potential impact of a violation, if one were to be found. However, this is an inadequate strategy because would-be violators can be strategic and, thus, reason about the specific violations they can perform so that they balance the chance of being audited with the benefits of the violation. To address this challenge, we introduce a model based on a Stackelberg game, in which an auditor chooses a randomized auditing policy, while potential violators choose their victims (such as which medical records to view) or to refrain from malicious behavior after observing the auditing policy.

Specifically, our model restricts the space of audit policies to consider two dimensions: 1) how to prioritize alert categories and 2) how much resource to allocate to each category. We show that even a highly restricted version of the auditor’s problem is NP-Hard. Nevertheless, we propose a series of algorithmic methods for solving these problems, leveraging a combination of linear programming and column generation to compute an optimal randomized policy for prioritizing alert categories. We perform an extensive experimental evaluation with two real datasets—one involving EMR access alerts and the other pertaining to credit card eligibility decisions—the results of which demonstrate the effectiveness of our approach.

The remainder of the paper is organized as follows. In Section  II, we formally define the game theoretic alert prioritization problem and prove its NP-hardness. In Section III, we describe the algorithmic approaches for computing a randomized audit policy. In Section  IV, we introduce a synthetic dataset to show, in a controlled manner, the effectiveness of our methods for approximating the optimal solution with dramatic gains in efficiency. In Sections  V, we use two real datasets (from healthcare and finance) that rely upon predefined alert types to show that our methods lead to high-quality audit strategies. In Section VI, we discuss related work on alert processing in the database systems, security and game theory, and audit games. We discuss our findings and conclude this paper in Section VII.

Ii Game Theoretic Model of Alert Prioritization

In environments dealing with sensitive data or critical services, it is common to deploy TDMTs to raise alerts upon observing suspicious events. By defining ad hoc alert types, each suspicious event can be marked with an alert label, or type, and put into an audit bin corresponding to this type. Typically, the vast majority of the raised alerts do not correspond to actual attacks, as they are generated as a part of routine workflow that is too complex to accurately capture. Consequently, looking for actual violations amounts to looking for needles in a large haystack of alerts, and inspecting all, or even a large proportion of, alerts that are typically generated is rarely feasible. A crucial consideration, therefore, is how to prioritize alerts, choosing a subset that can be audited given a specified auditing budget from a vast pool of possibilities. The prioritization problem is complicated by the fact that intelligent adversaries—that is, would-be violators of organizational access policies—would react to an auditing policy by changing their behavior to balance the gains from violations, and the likelihood, and consequences, of detection.

We proceed to describe a formal model of alert prioritization as a game between an auditor, who chooses an alert prioritization policy, and multiple attackers, who determine the nature of violation, or are deterred from one, in response. In the described scenarios, we assume that the attackers have complete information. For reference purposes, the symbols used throughout this paper are described in Table I.

Symbols Interpretation
Set of alert types
Set of entities or users causing events
Set of records or files available for access
Probability of raising type alert by attack
Cost for auditing an alert of type
Auditing budget
Probability that at most alerts are in type
Set of all alert prioritizations over
Number of alerts under type
Budget threshold assigned for auditing type
Adversary’s gain when attack is undetected
Adversary’s penalty when attack is captured
Cost of deploying attack
Probability of choosing an alert prioritization
Probability that is a potential adversary
TABLE I: A legend of the notation used in this paper.

Ii-a System Model

Let be the set of potential adversaries, such as employees in a healthcare organization, some of whom could be potential violators of privacy policies, and be the set of potential victims, such as patients in a healthcare facility. We define events, as well as attacks, by a tuple . A subset of these events will trigger alerts. Now, let be the set of alert types, or categorical labels assigned to different kinds of suspicious behavior. For example, a doctor viewing a record for a patient not assigned to them and a nurse viewing the EMR for another nurse (who is also a patient) in the same healthcare facility could trigger two distinct alert types. We assume that each event maps to at most one alert type . This mapping may be stochastic; that is, given an event , an alert is triggered with probability , and no alert is triggered otherwise (i.e., for all ). Typically, both categorization of alerts and corresponding mapping between events and types is given (for example, through predefined rules). If not, it can be inferred by generating possible attacks and inspecting how they are categorized by TDMT. Auditing each alert is time consuming and the time to audit an alert can vary by alert type. Let be the cost (e.g., time) of auditing a single alert of type and let be the total budget allocated for auditing.

Normal events resulting in alerts arrive based on a distribution reflecting a typical workflow of the organization. We assume this distribution is known, represented by , which is the probability that at most alerts of type are generated. If we make the reasonable assumption that attacks are rare events and that the alert logs are tamper-proof by applying certain technique, then this distribution can be obtained from historical alert logs. It is noteworthy that the probability that adversaries successfully manipulate the distribution in the sensitive practices (e.g., the EMR system or the credit card application program), to fool the audit model is almost zero. The cost of orchestrating and implementing such attacks is much higher than what could be gained from running a few undetected attacks.

Ii-B Game Model

We model the interaction between the auditor and potential violators as a Stackelberg game. Informally, the auditor chooses a possibly randomized auditing policy, which is observed by the prospective violators, who in response choose the nature of the attack, if any. Both decisions are made before the alerts produced through normal workflow are generated according to a known stochastic process .

In general, a specific pure strategy of the defender (auditor) is a mapping from an arbitrary realization of alert counts of all types to a subset of alerts that are to be inspected, abiding by a constraint on the total amount of budget allocated for auditing alerts. Even representing a single such strategy is intractable, let alone optimizing in the space of randomizations over these. We therefore restrict the defender strategy space in two ways. First, we let pure strategies involve an ordering ( and , if , then ) over alert types, where the subscript indicates the position in the ordering, and a vector of thresholds , with being the maximum budget available for auditing alerts in category . Let be the set of feasible orderings, which may be a subset of all possible orders over types (e.g., organizational policy may impose constraints, such as always prioritizing some alert categories over others). We interpret a threshold as the maximum budget allocated to ; thus, the most alerts of type that can be inspected is . Second, we allow the auditor to choose a randomized policy over alert orderings, with being the probability that ordering over alert types is chosen, whereas the thresholds are deterministic and independent of the chosen alert priorities.

We have a collection of potential adversaries , each of whom may target any potential victim . We assume that the adversary will target exactly one victim (or at most one, if contains an option of not attacking anyone). Thus, the strategy space of each adversary is . In addition, we assume that any given potential adversary is actually unlikely to consider attacking. We formalize it by introducing a probability that an attack by is considered at all (i.e., does not even consider attacking with probability ).

Suppose we fix a prioritization and thresholds . Let be the position of alert type in and be the alert type in position in the order. Let be the budget remaining to inspect alerts of type if the order is , the defender uses alert type thresholds , and the vector of realizations of benign alert type counts is . Then we have

Now, let us take a moment to unpack this expression for context. For the audited alert type , we repeatedly compare the threshold with to determine how much budget will be left for the types that follow in the priority order. If the total budget that is eaten by inspecting alerts prior to is larger than , returns , and no alerts of type will be inspected. Next, we can compute the number of alerts of type that are audited as

Suppose that an attack generates an alert of type . As noted earlier, we assume that the number of alerts generated due to attacks is a negligible proportion of all generated alerts (e.g., when are small). Then, the probability that an alert of type generated through an attack is detected is approximately


We can further approximate this probability by sampling from the joint distribution over alert type counts .

The adversary does not directly choose alert types, but rather the victims (e.g., an EMR). The probability of detecting an attack under audit order and audit thresholds is then


We now have sufficient preliminaries to define the utility functions of the adversaries . Let denote the penalty of the adversary when captured by the auditor, denote the benefit if the adversary is not audited, and the cost of an attack. One natural example is , where is a Boolean indicator of the presence of alert type and the severity of this alert category. The utility of the adversary is then


We assume that the game is zero-sum. Thus, the auditor’s goal is to find a randomized strategy and type-specific thresholds to minimize the expected utility of the adversary:


We call this optimization challenge the optimal auditing problem (OAP).

Since the game is zero-sum, the optimal auditing policy can be computed using the following mathematical program, which directly extends the standard linear programming formulation for computing mixed-strategy Nash equilibria in zero-sum games:


Indeed, if we fix the decision variables , the formulation becomes a linear program. Nevertheless, since the set of all possible alert prioritizations is exponential, even this linear program has exponentially many variables. Furthermore, introducing decision variables makes it non-linear and non-convex. Next, we show that solving this problem is NP-hard, even in a restricted special case. We prove this by reducing from the 0-1 Knapsack problem.

Definition 1 (0-1 Knapsack Problem)

Let be a set of items where each item has a weight and a value , with and integers. is a budget on the total amount of weight (an integer). Question: given a threshold , does there exist a subset of items such that and ?

Theorem 1

OAP is NP-hard even when is a singleton.

The proof of this theorem is in the Appendix.

Iii Solving the alert prioritization game

There are two practical challenges that need to be addressed to compute useful approximate solutions to the OAP. First, there is an exponential set of possible orderings of alert types that needs to be considered to compute an optimal randomized strategy for choosing orderings. Second, there is a combinatorial space of possible choices for the threshold vectors . In this section, we develop a column generation approach for the linear program induced when we fix a threshold vector . We then introduce a search algorithm to compute the auditing thresholds.

Iii-a Column Generation Greedy Search

By fixing , Equation 5 becomes a linear program, albeit with an exponential number of variables. However, since the number of constraints is small, only a limited number of variables will be non-zero. The challenge is in finding this small basis. We do so using column generation, an approach in which we iteratively solve a linear program with a small subset of variables, and then add new variables with a negative reduced cost. We refer to this method as Column Generation Greedy Search (CGGS), the pseudocode for which is in Algorithm 1.

Specifically, we begin with a small subset of alert prioritizations . We solve the linear program induced after fixing in Equation 5, restricted to columns in . For reference purposes, we call this the master problem, which is generated by function . Next, we check if there exists a column (ordering over types) that improves upon the current best solution. The column of parameter matrix of constraints can be denoted as for the decision variable or for the decision variable . The corresponding reduced costs, computed by function , are and , where is the solution of the dual problem. By minimizing the reduced costs, we generate one new column in each iteration and add it to the subset of columns in the master problem. Within the process of generating a new column, we use to denote the parameter column with the audit order . This process is repeated until we can prove that the minimum reduced cost is non-negative.

The subproblem of generating the best column is itself non-trivial. We address this subproblem through the application of a greedy algorithm for generating a reduced-cost-minimizing ordering over alert types. The intuition behind CGGS is that, in the process of generating a new audit order, we greedily add one alert type at a time to minimize the reduced cost given the order generated thus far. We continue until the objective (reduced cost) fails to improve.

Input : The set with a single random pure strategy for the auditor.
Output : The set of pure strategies .
1 while True do
2       ; /* Construct LP using current */
3       ; /* Solve dual problem */
4       ;
5       while  do
6             ;
8       end while
9      if  then
10             ;
12      else
13             break;
15       end if
17 end while
Algorithm 1 Column Generation Greedy Search (CGGS)

Iii-B Iterative Shrink Heuristic Method

Armed with an approach for solving the linear program induced by a fixed budget threshold vector , we now develop a heuristic procedure to find alert type thresholds.

First, it should be recognized that because to allow otherwise would clearly waste auditing resources. Yet there is no explicit upper bound on the thresholds. However, given the distribution of the number of alerts for an alert type , we can obtain an approximate upper bound on , where . This is possible because setting the thresholds above such bounds would lead to negligible improvement. Consequently, searching for a good solution can begin with a vector of audit thresholds, such that for each , . Leveraging this intuition, we design an heuristic method, which iteratively shrinks the values of a good subset of audit thresholds according to a certain step size .1 We refer to this as the Iterative Shrink Heuristic Method (ISHM), the pseudocode for which is provided in Algorithm 2.

In each atomic searching action, ISHM first makes a subset of thresholds strategically shrink. Next, it checks to see if this results in an improved solution. We introduce a variable , which indicates the level (or the size) of the given subset of , and , which controls the step size.

At the beginning, the vector of audit thresholds is initialized with the approximate upper bounds. Then, by assigning , we consider shrinking each of the audit thresholds . The coefficient for shrinking is defined by the in line , which is instantiated with the predefined step size ; i.e., . If the best value for the objective function in the candidate subsets at after shrinking shows an improvement, then the shrink is accepted and the shrinking coefficient is made smaller by increasing . When no coefficient leads to improvement, we increase by one, which induces tests of threshold combinations at the same shrinking ratio. This logic is described in line through .

Once an improvement occurs, the search course resets based on the current . The search terminates once .

Note that for a single improvement, the worst-case time complexity is . Though exponential, our experiments show that ISHM achieves outstanding performance, both in terms of precision (of approaching the optimal solution) and efficiency.

Input : Instance of the game, step size .
Output : Vector of audit thresholds .
1 Initialize with full coverages in ;
2 ; ;
3 while  do
4       ; /* Find combinations */
5       ;
6       for  to  do
7             ;
8             ; ;
9             for  to  do
10                   ;
11                   for  to  do
12                         ;
14                   end for
15                  ;
16                   /* Return LP objective value */
17                   if  then ; ;
19             end for
20            if  then
21                   ;
22                   ;
23                   /* Types in need of update */
24                   for  to  do ;
25                   break;
27             end if
28            ;
30       end for
31      if  then ;
32       else ;
34 end while
Algorithm 2 Iterative Shrink Heuristic Method (ISHM)

Iv Controlled Evaluation

To gain intuition into the potential for our methods, we evaluated the performance of the ISHM and CGGS approaches using a synthetic dataset, Syn_A. To enable comparison with an optimal solution, we use a relatively small synthetic dataset, but as will be clear, it is sufficient to illustrate the relationship between our methods and the optimal brute force solution.

To perform the analysis, we vary the audit budgets and step size of ISHM. In addition, we evaluate a combination of CGGS+ISHM (since the former is also an approximation), by again comparing to the optimal.

Iv-a Data Overview

The dataset Syn_A consists of potential attackers who perform accesses (2), files, predefined alert types, and a set of rules for triggering alerts if any access happens. Table II summarizes the information of Syn_A and related parameters in the corresponding scenario. We let the number of alerts for all types be distributed according to a Gaussian distribution with means and standard deviation as reported in Table II(a). Since the number of alerts are integers, we discretize the -axis of each alerts cumulative distribution function and use the corresponding probabilities for each possible alert count. We consider the probability coverage for each alert type to obtain a finite upper bound on alert counts.

We assume alerts are triggered deterministically for each access, a common case in rule-based systems. The alert type that will be triggered for each potential access is provided in Table II(b), where “-” represents a benign access. This table is generated with a probability vector for each employee, which corresponds to alert type vector . Although in reality, benign accesses may be more frequent, we lower their probability to better differentiate the final value of the objective function. The benefit of the adversary for a successful attack, the cost of an attack and the cost of an audit are all directly related to the alert type, which are shown in Table II(a). In addition, the penalty for being caught is set to a constant value of .

Type 1 Type 2 Type 3 Type 4
Mean 6 5 4 4
Std 2 1.6 1.3 1
99.5% Coverage +/-5 +/-4 +/-3 +/-3
Benefit 3.4 3.7 4 4.3
Attack Cost 0.4 0.4 0.4 0.4
Audit Cost 1 1 1 1
(a) Parameter values for alert types in the synthetic setting.
Employee Record
(b) Rules for alert types in the synthetic setting.
TABLE II: Description of Dataset Syn_A.

Iv-B Optimal Solution with Varying Budget

Based on the given information, we can compute the optimal OAP solution. First, the search space for audit thresholds in this scenario is as follows: 1) for each alert type, the audit threshold , 2) the sum of thresholds for all alert types should be greater than or equal to , 3) for each type, the upper bound of the audit threshold is where . Concretely, we set vector as the upper bound for finding the optimal solution. Thus, the space of the investigation of the optimal solution is . Note that is also a possible choice, which means the auditor will not check the corresponding alert type. Thus, it is infeasible to directly solve the OAP in the instances with a large number of alert types or large .

To investigate the performance of the proposed audit model, we allocated a vector of audit budgets , which has a wide range with respect to the scale of the means of the alert types. We then apply a brute force search to discover an optimal vector of budget thresholds for each type. Table III shows the optimal solution of OAP for each candidate , including the optimal value of the objective function, optimal threshold (using the smallest optimal threshold whenever the optimal solution is not unique), pure strategies in the support of the optimal mixed strategy, and the optimal mixed strategy of the auditor. As expected, it can be seen that as the budget increases, the optimal value of the objective function (minimized by the auditor) decreases monotonically.

ID Budget Optimal Objective Value Optimal Threshold Effective Pure Strategy Optimal Mixed Strategy
1 2 12.2945 [1,1,1,1] [2,3,4,1][4,1,3,2][4,2,3,1][4,3,2,1] [0.3566, 0.3780, 0.1210, 0.1444]
2 4 7.7176 [2,1,1,2] [1,2,3,4][2,1,3,4][4,2,1,3][4,2,3,1] [0.4664, 0.0052, 0.0934, 0.4350]
3 6 3.2651 [2,2,2,2] [2,1,3,4][4,1,3,2][4,2,1,3][4,2,3,1] [0.2748, 0.2341, 0.3293, 0.1618]
4 8 -0.4517 [3,3,2,2] [2,1,3,4][4,1,3,2][4,2,1,3][4,2,3,1] [0.0762, 0.4600, 0.1329, 0.3309]
5 10 -2.1314 [3,3,3,3] [1,2,3,4][1,4,3,2][4,1,2,3][4,1,3,2] [0.3926, 0.0788, 0.4080, 0.1206]
6 12 -3.7345 [4,4,3,3] [2,1,3,4][4,2,3,1][4,2,1,3][4,1,3,2] [0.2028, 0.1554, 0.2076, 0.4342]
7 14 -5.1645 [5,4,3,3] [2,1,3,4][4,2,3,1][4,2,1,3][4,1,3,2] [0.3559, 0.2199, 0.3176, 0.1066]
8 16 -6.4510 [6,5,4,4] [2,1,3,4][4,1,3,2][4,2,1,3][4,2,3,1] [0.2431, 0.2636, 0.1728, 0.3205]
9 18 -7.4649 [7,6,5,5] [2,1,3,4][4,1,3,2][4,2,1,3][4,2,3,1] [0.2710, 0.2630, 0.2054, 0.2615]
10 20 -8.1561 [9,7,6,6] [1,2,3,4][4,1,2,3][4,1,3,2][4,2,3,1] [0.2398, 0.1742, 0.2275, 0.3585]
TABLE III: The optimal solution for the auditor under various budgets.

Iv-C Findings

Our heuristic methods aim to find an approximate solution through major reductions in computation complexity. In this respect, the search step size is a key factor to consider because it could lead the search into a locally optimal solution. To investigate the gap between the objective function with the optimal solution, as well as the influence of on the gap, we performed experiments with a series of step sizes . Tables IV and V, summarize the results, where each cell consists of two items: 1) the minimized sum of the maximal utilities of all adversaries obtained using the heuristic method and 2) the corresponding audit threshold vector.

There are three findings worth highlighting. First, when is fixed, the approximated values of the objective function decrease as the budget increases. This is akin to the trend shown in Table III. Second, when the budget is fixed, the approximated values achieved through ISHM and ISHM+CGGS exhibit a general growth trend as increases. This occurs because larger shrink ratios increase the likelihood that the heuristic search will miss more of the good approximate solutions. Third, we find that the ISHM and ISHM+CGGS solutions are close to the optimal. To measure the solution quality as a function of , we use , where denotes the approximate optimal values in Tables IV and V and denotes the optimal values provided by Table III.

In Table VI, it can be seen that ISHM (and solving the linear program to optimality) achieves solutions near of the optimal (as denoted by ) when the step size . Even the approximately optimal solutions with has a good approximation ratio (above ). As such, it appears that if we choose an appropriate , then ISHM can perform well.

When we combine ISHM+CGGS (denoted by ), the approximation quality drops compared to , as we would expect, with the lone exception of (). However, is very close to , which suggests that our approximate column generation method does not significantly degrade the quality of the solution.

Next, we consider the computational burden for ISHM to achieve an approximate target of the optimal solution. Table VII provides the values of the threshold vectors under various and . It can be seen that the number of threshold candidates explored decreases as the step size grows. For a given , the number of thresholds considered by the algorithm initially increases, but then drops as the audit budget increases. The reason that less effort is necessary at the extremes of the budget range is that the restart of the test for a single alert type (to find a better position) is invoked less frequently. By contrast, a larger amount of effort is required in the middle of the budget range due to more frequent restarts (although this yields only a small improvement).

Finally, we investigate the average number for the threshold vectors explored by the algorithm over the budget range . For the various step sizes, we represent the results in vector form . Dividing by the number of investigations needed to discover the optimal solution, the resulting ratio vector is . Thus, when (when both and are greater than ), the number of thresholds explored is only of the entire space. As such, by applying ISHM, the number of investigated threshold candidates can be greatly reduced without significantly sacrificing solution quality.

Approximation of Optimal Loss of the Auditor and corresponding thresholds by ISHM
TABLE IV: The approximation of the optimal solutions obtained by ISHM at various levels of and .
Approximation of Optimal Loss of the Auditor and corresponding thresholds by ISHM + CGGS
TABLE V: The approximation of the optimal solutions obtained by ISHM + CGGS at various levels of and .
TABLE VI: The average precision over the budget vector by applying ISHM and ISHM+CGGS.
2 4 6 8 10 12 14 16 18 20
251 267 255 243 235 227 199 207 191 171
128 144 148 140 132 124 108 108 92 84
65 109 101 93 85 85 81 77 69 65
74 66 78 70 70 62 62 62 50 50
35 43 47 47 47 47 43 35 35 35
TABLE VII: The number of threshold vectors checked by ISHM with a given budget and step size .

V Model Evaluation

The previous results suggest ISHM and CGGS can be efficient and effective in solving the OAP in a small controlled environment. Here, we investigate the performance of the proposed game-theoretical audit model on more realistic and larger datasets. This evaluation consists of comparing the quality of solutions of OAP with several natural alternative auditing strategies.

The first dataset, Rea_A, corresponds to the EMR access logs of Vanderbilt University Medical Center (VUMC). This dataset is notable because VUMC privacy officers rely on this data to conduct retrospective audits to determine if there are accesses that violate organizational policy. The central goal in this use case is to preserve patient privacy. The second dataset, Rea_B, consists of public observations of credit card applications. It labels applicants as having either low or high risk of fraud. We provide an audit mechanism to capture events of credit card fraud based on the features in this dataset.

V-a Data Overview

Rea_A consists of the VUMC EMR access logs for 28 continuous workdays during 2017. There are access events, () of which are repeated accesses.3 We filtered out the repeated accesses to focus on the distinct user-patient relationships established on a daily basis. The mean and standard deviation of daily access events was and , respectively. The features for each event include: 1) timestamp, 2) patient ID, 3) employee ID, 4) patient’s residential address, 5) employee’s residential address, 6) employee’s VUMC department affiliation, and 6) indication of if patient is an employee. We focus on the following alert types: 1) employee and patient share the same last name, 2) employee and patient work in the same VUMC department, 3) employee and patient share the same residential address, and 4) employee and patient are neighbors within a distance threshold.

In certain cases, the same access may generate multiple alerts, each with a distinct type. For example, if a husband, who is a BMRC employee, accesses his wife’s EMR, then two alert types may be triggered: 1 (same last name) and 3 (same address). We therefore redefine the set of alert types to also consider combinations of alert categories. The resulting set of alert types is detailed in Table VIII.

ID Alert Type Description Mean Std
1 Same Last Name 183.21 46.40
2 Department Co-worker 32.18 23.14
3 Neighbor ( 0.5 miles) 113.89 80.44
4 Last Name; Same address 15.43 14.61
5 Last Name; Neighbor ( 0.5 miles) 23.75 11.07
6 Same address; Neighbor ( 0.5 miles) 20.07 11.49
7 Last Name; Same address; Neighbor ( 0.5 miles) 32.07 16.54
TABLE VIII: Description of the EMR alert types.

We label each access event in the logs with a corresponding alert type or as “benign” (i.e., no alerts generated). To evaluate our methods, we choose a random sample of employees and patients who generate at least one alert. This set of employees and the set of patients then results in potential accesses, where each employee can access each patient.

We let the probability that an employee could be malicious be , which is artificially high, but enables us to clearly compare the methods. The benefit vector for the adversary is for the corresponding categories of alert types (1-7 in Table VIII). The penalty for capture is set to . We set the cost of both an attack and an audit to . We acknowledge that the model parameters are ad hoc, but this does not affect the results of our comparative analysis. In practice, this would be accomplished based on expert opinion, but is outside the scope of this study.

Rea_B is the Statlog (German Credit Data) dataset available from the UCI Machine Learning Repository. Rea_B contains credit card applications. It is composed of attributes describing the status of the applicants pertaining to their credit risk. Before issuing a credit card, banks would determine if it could be fraudulent based on the features in the data. Nevertheless, no screening process is perfect, and given a large number of applications, applications will require retrospective audits to determine whether specific applications should be canceled. Thus, alerts in this setting aim to indicate potential fraud and a subset of such alerts are chosen for a time consuming auditing process. Leveraging the provided features, we define alert types, which are triggered by the specific combinations of attribute values and the purpose of application. The selected purposes of application are the “victims” in our audit model. Table IX summarizes how alerts are triggered. In the description field, italicized words represent the purpose of the application, while the other words represent feature values.

We used the alert categorizations discussed above to label the applications with alert types, excluding any that fail to receive a label. Among these, we randomly selected applicants who may choose to “attack” one of the purposes of credit card applications, for a total of possible events. The benefit vector for the adversary is for each of the alert types generated, respectively. We set the penalty for detection to and costs for attack and audit were both set to . Again, to facilitate comparison we set in all cases.

ID Alert type Description Mean Std
1 No checking account, Any purpose 370.04 15.81
2 Checking , New car, Education 82.42 7.87
3 Checking , Unskilled, Education 5.13 2.08
4 Checking , Unskilled, Appliance 28.21 5.25
5 Checking , Critical account, Business 8.31 2.96
TABLE IX: Description of the defined alert types.

V-B Comparison with Baseline Alternatives

The performance of the proposed audit model was investigated by comparing with several natural alternative audit strategies as baselines. The first alternative is to randomize the audit order over alert types, which we call Audit with random orders of alert types. Though random, this strategy mimics the reality of random reporting (e.g., where a random patient calls a privacy official to look into alleged suspicious behavior with respect to the use of their EMR). In this case, we adopt the thresholds out of the proposed model with to investigate the performance. The second alternative is to randomize the audit thresholds. We refer to this policy as Audit with random thresholds. For this policy, we assume that 1) the auditor’s choice satisfies and 2) the auditor has the ability to find the optimal audit order after deciding upon the thresholds. The third alternative is a naive greedy audit strategy, where the auditor prioritizes alert types according to their utility loss (i.e., greater consequence of violations). In this case, the auditor investigates as many alerts of a certain type as possible before moving on to the next type in the order. For our experiments, when the alert type order is based on the loss of the auditor, which is the benefit the adversary receives when they execute a successful attack. Thus, we refer to this strategy as Audit based on benefit.

The following performance comparisons are assessed over a broad range of auditing budgets. For our model, we present the values of the objective function with three different instances of the step size in ISHM: . Figures 1 and 2 summarize the performance of the proposed audit model and three alternative audit strategies for Rea_A and Rea_B, respectively.

For dataset Rea_A, the range of was set to through . The budget of covers about of the sum of the means of the seven alert types. In reality, such coverage is quite high. By applying the proposed audit model, we approximately solve the OAP given and . For Audit with random orders of alert types, we assign the audit thresholds using ISHM with . The randomization is repeated times without replacement. As for Audit with random thresholds, we randomly generate the audit thresholds to solve the corresponding LP, which are repeated times. For Audit based on benefit, we randomly sample instances of based on the distributions of alert types learned from the dataset.

Based on Figure 1, there are several findings we wish to highlight. First, in our model, as the audit budget increases, the auditor’s loss decreases. At the high end, when , the auditor’s loss is zero, which, in the VUMC audit setting, implies that all the potential adversaries are deterred from an attack. This valuation of is smaller than of the sum of distribution means of all alert types. The reason for this phenomenon stems from the fact that when the audit budget increases, the audit model finding better approximations of the optimal audit thresholds, which, in turn, enables the auditor to significantly limit the potential gains of the adversaries. Second, our proposed model significantly outperforms all of the baselines. Third, even though Audit with random orders of alert types uses approximated audit thresholds, the auditor’s loss is substantially greater than our proposed approach. However, the auditor’s loss for the alternatives approach ours when . This is because the thresholds are , such that the audit order is less of a driver than in other situations. Fourth, Audit based on benefit tends to have very poor performance compared to other policies. This is because when the audit order is fixed (or is predictable), adversaries have greater evasion ability and attack more effectively. Fifth, Audit with random thresholds tends to outperform the other baselines, but is still significantly worse than our approach. The is because the auditor has the ability to search for the optimal audit policy, but the thresholds are randomly assigned such that they are hampered in achieving the best solution.

Fig. 1: Auditor’s loss in the proposed and baseline models in the Rea_A dataset.

For the credit card application scenario, Figure 2 compares the auditor’s loss in our heuristics and the three baselines. For dataset Rea_B, the range for is to with a step size of . As expected, as the budget increases, the auditor sustains a decreasing average loss. It can be seen that the proposed audit model significantly outperforms the alternative baselines. Specifically, as the auditing budget increases, the auditor’s loss trends towards, and becomes, in our approach. This means that the attackers are completely deterred. For the alternatives, as before, Audit with random thresholds outperforms other strategies. And, just as before, the strategy that greedily audits alert types (in order of loss) tends to perform quite poorly.

Fig. 2: Loss of the auditor in the proposed and alternatives audit model in the Rea_B dataset.

Vi Related Work

The development of computational methods for raising and subsequently managing alerts in database systems is an active area of research. In this section, we review recent developments that are most related to our investigation.

Alert Frameworks

Generally speaking, there are two main categories by which alerts are generated in a TDMT: 1) machine learning methods – which measure the distance from either normal or suspicious patterns [mathew2010data, kamra2009survey, kamra2008detecting], and 2) rule-based approaches – which flag the occurrences of predefined events when they are observed [cook2004rule, samu2002database, ben2008system]. Concrete implementations are often tailored to distinct application domains.

In the healthcare sector, methods have been proposed to find misuse of EMR systems. Boxwala et at. [boxwala2011using] treated it as a two-label classification problem and trained support vector machines and logistic regression models to detect suspicious accesses. Given that not all suspicious accesses follow a pattern, various techniques have been developed to determine the extent to which an EMR user [chen2012detecting] or their specific access [chen2012specializing] deviated from the typical collaborative behavior. By contrast, Fabbri et al. [fabbri2013explaining, fabbri2013select, fabbri2011explanation] designed an explanation-based auditing mechanism which generates and learns typical access patterns from an expert-, as well as data-driven, view. EMR access events by authenticated employees can be explained away by logical relations (e.g., a patient scheduled an appointment with a physician), while the residual can trigger alerts according to predefined rules (e.g., co-workers) or simply fail to have an explanation. The remaining events are provided to privacy officials for investigation; however, in practice, only a tiny fraction can feasibly be audited due to the resource limitation.

In the financial sector, fraud detection [ngai2011application] in credit card applications assists banks in mitigating their losses and protecting consumers [delamaire2009credit]. Several machine learning-based [bhattacharyya2011data] models have been developed to detect fraud behavior. Some of the notable models include hidden Markov models[srivastava2008credit], neural networks [chan1999distributed], support vector machines [chen2006new], etc. Rule-based techniques were also integrated into some detection frameworks [brause1999neural, sternberg1997using, syeda2002parallel, yeh2009comparisons]. While these methods trigger alerts for investigators, they result in a significant number of false positives—a problem which can be mitigated through alert prioritization schemes.

Alert Burden Reduction

Various methods have been developed to reduce alert magnitude generated in database systems. Many focus on reducing redundancy and clustering alerts based on their similarity. In particular, a cooperative module was proposed for intrusion detection, which implemented functions of alert management, clustering and correlation [cuppens2002alert]. Xiao et al. proposed a multilevel alert fusion model to abstract high-level attack scenarios to reduce redundancy [xiao2008alert]. As an alternative, fuzzy set theory was applied by Maggi et al. to design robust alert aggregation algorithms [maggi2009reducing]. Also, a fuzzy-logic engine to prioritize alerts was introduced by Alsubhi et al. by rescoring alerts based on a few metrics [alsubhi2012fuzmet]. Njogu et al. built a robust alert cluster by evaluating the similarity between alerts to improve the quality of those sent to analysts [njogu2010using]. However, none of these approaches consider the impact of alert aggregation and prioritization on decisions by potential attackers, especially as the latter may choose attacks that circumvent the prioritization and aggregation mechanisms.

Security Games

Our general model is related to the literature on Stackelberg security games [kiekintveld2009computing], where a single defender first commits to a (possibly randomized) allocation of defense resources, while the attacker chooses an optimal attack in response based on observation. Such models have been applied in a broad variety of security settings, such as airport security [pita2008deployed], coast guard patrol scheduling [an2013deployed], and even for preventing poaching and illegal fishing [fang2017paws]. However, models used in much of this prior work are specialized to physical security, and do not readily generalize to the problem of prioritizing alerts for auditing. This is the case even for several efforts specifically dealing with audit games [blocki2013audit, blocki2014audit], which abstract the problem into a set of targets that could be attacked, so that the structure of the model remains essentially identical to physical security settings. In practical alert prioritization and auditing problems, in contrast, a crucial consideration is that there are many potential attackers, and many potential victims or modes of attack for each of these. Moreover, auditing policies involve recourse actions where the specific alerts audited depend on the realizations of alerts of various types. Since alert realizations are stochastic, this engenders complex interactions between the defender and attackers, and results in a highly complex space of prioritization policies for the defender. In an early investigation on alert prioritization, it was assumed that 1) the identity of a specific attacker was unknown and 2) an exhaustive auditing strategy across alert types of a given order would be applied [laszka2017game]. These assumptions were relaxed in the investigation addressed by our current study.

Recently, the problem of assigning alerts to security analysts has been introduced [ganesan2016dynamic], with a follow-up effort casting it within a game theoretic framework [Schlenker2017donot]. The two key limitations addressed by our framework are: 1) it considers only single attacker, whereas auditing decisions in the context of access control policies commonly involve many potential attackers, with most never considering the possibility of an attack; 2) it assumes that the number of alerts in each category is known a priori to both the auditor and attacker. In practice, alert counts by category are stochastic and can exhibit high variance.

Vii Discussion and Conclusions

TDMTs are usually deployed in database systems to address a varierty of attacks that originate from within and beyond an organization. However, an overwhelming alert volume is far beyond the capability of auditors with limited resources. Our research illustrates that policy compliance auditing, as a significant component of database management, can be improved by prioritizing which alerts to focus on via a game theoretic framework, allowing auditing policies to make best use of limited auditing resources while simultaneously accounting for strategic behavior of potential policy violators. This is notable because auditing is critical to a wide range of management requirements, including privacy breach and financial fraud investigations. As such, this model and the effective heuristics we offer in this study fill a major gap in the field.

There are several limitations of our approach that we wish to highlight as opportunities for future investigations. First, there are limitations to the parameterization of the game. One notable aspect is that we assumed that the game has a zero-sum property. Yet in reality, this may not be the case. For example, an auditor is likely to be concerned less about the cost incurred by an adversary for executing an attack and more concerned about the losses that arise from successful violations Additionally, while our experiments show the proposed audit model outperforms natural alternatives, it is unclear how sensitive this result is to parameter variations. Thus, more investigation is needed in the next step.

A second set of limitations stems from the assumptions we rely upon. In particular, we assumed that each attack is instantaneous, which turned the problem into a one-shot two-stage game. However, attacks in the wild may require multiple cycles to fully execute, such that the auditor may be able to capture the attacker before they complete their exploit. Furthermore, our model is predicated on an environment in which the auditor has complete knowledge, including the identities, about the set of potential adversaries.

A third limitation is in the economic premise of the attack. Specifically, we expected the interaction between the auditor and adversaries as fully rational. In reality, adversaries may be bounded in their rationality, and an important extension would be to generalize the model consider such behavior.

Viii Acknowledgement

This work was supported, in part by grant R01LM10207 from the National Institutes of Health, grant CNS-1526014, CNS-1640624, IIS-1649972 and IIS-1526860 from the National Science Foundation, grant N00014-15-1-2621 from the Office of Naval Research and grant W911NF-16-1-0069 from the Army Research Office.


Appendix A Appendix

Proof of Theorem 1


We reduce from the 0-1 Knapsack problem defined by Definition 1.

We begin by constructing a special case of the auditing problem and work with the decision version of optimization Equation 4, in which we decide whether the objective is below a given threshold . First, suppose that for all alert types with probability 1. Since the set of orders is a singleton, the probability distribution over orders is not relevant. Consequently, it suffices to consider for all , and the actual order over types is not relevant because for all types. Consequently, we can choose to select an arbitrary subset of types to inspect subject to the budget constraint (i.e., type will be audited iff ). Thus, the choice of is equivalent to choosing a subset of alert types to audit.

Suppose that , and each victim deterministically triggers some alert type for any attacker . Let for all , and suppose that for every , there is a unique type with if and only if and 0 otherwise. Then if and only if (i.e., alert type is not selected by the auditor) and 0 otherwise. Finally, we let for all .4

For the reduction, suppose we are given an instance of the 0-1 Knapsack problem. Let , and for each , generate attackers with . Thus, . Let be the cost of auditing alerts of type , and let . Define . Now observe that the objective in Equation 4 is below if and only if , or, equivalently, if there is such that . Thus, the objective of Equation 4 is below if and only if the Knapsack instance has a subset of items which yield , where must satisfy the same budget constraint in both cases.


  1. “Good” in this context means that shrinking thresholds within the subset improves the value of the objective function.
  2. The artificially high incidence of attacks here is merely to facilitate a comparison with a brute-force approach.
  3. We define a repeated access as an access that is committed by the same employee to the same patient’s EMR on the same day.
  4. While this is inconsistent with our assumption that attackers constitute only a small portion of the system users, we note that this is only a tool for the hardness proof.