A Missing Proofs

Optimizing Password Composition Policies

Abstract

A password composition policy restricts the space of allowable passwords to eliminate weak passwords that are vulnerable to statistical guessing attacks. Usability studies have demonstrated that existing password composition policies can sometimes result in weaker password distributions; hence a more principled approach is needed. We introduce the first theoretical model for optimizing password composition policies. We study the computational and sample complexity of this problem under different assumptions on the structure of policies and on users’ preferences over passwords. Our main positive result is an algorithm that – with high probability — constructs almost optimal policies (which are specified as a union of subsets of allowed passwords), and requires only a small number of samples of users’ preferred passwords. We complement our theoretical results with simulations using a real-world dataset of million passwords.

Password composition policy, Sampling, Computational complexity
\acmVolume

X \acmNumberX \acmArticleX \acmYear2013 \acmMonth2

\terms

Algorithms, Economics, Security, Theory

\acmformat

Jeremiah Blocki, Saranga Komanduri, Ariel D. Procaccia and Or Sheffet, 2013. Optimizing Password Composition Policies.

{bottomstuff}

Authors’ addresses: J. Blocki, Computer Science Department, Carnegie Mellon University, email: jblocki@cs.cmu.edu; S. Komanduri, Human Computer Interaction Institute, Carnegie Mellon University, email: sarangak@cs.cmu.edu; A. D. Procaccia, Computer Science Department, Carnegie Mellon University, email: arielpro@cs.cmu.edu; O. Sheffet, Computer Science Department, Carnegie Mellon University, email: osheffet@cs.cmu.edu.
This research was supported in part by the National Science Foundation Science and Technology TRUST, by the National Science Foundation under grants DGE-0903659, CNS-1116776, CCF-1101215 and CCF-1116892, by CyLab at Carnegie Mellon under grants DAAD19-02-1-0389 and W911NF-09-1-0273 from the Army Research Office, by the AFOSR MURI on Science of Cybersecurity, by a gift from Microsoft Research and by a NSF Graduate Research Fellowship.

1 Introduction

Imagine a web surfer, an online shopper, or a reviewer in a prominent CS and Economics conference1 who logs on for the first time to a server; so that she can sign up for some service, place a shopping order, or view a list of assigned papers. Such a user registers on the server by choosing a username and picking a password. Naturally, our user’s first attempt at picking a password is her favorite combination ’123456’, which the server declines. She then has to pick a password that follows certain guidelines: of suitable length, involving lower- and upper-case letters, with numbers or special characters, etc. Such password composition policies defend against the “first line” of attack – guessing attacks by uninformed attackers (attackers with no previous knowledge of the user whose account they are trying to break into).

Password composition policies are a necessity because — without them — user-selected passwords are predictable. Indeed, many unrestricted users would select simple passwords like ’123456’, ’password’ and ’letmein[11]. Furthermore, this issue is of great importance to today’s economy. Passwords are commonly used in electronic commerce to protect financial assets. In fact, the passwords themselves have financial value. Symantec reported that compromised passwords are sold for between $4 and $30 on the black market [13], and a 2004 Gartner case study [27] estimated that it cost a large firm over $17 per password-reset call. Nevertheless, existing password composition policies are typically not principled, and do not necessarily result in less common passwords. For example, studies show that users respond to restrictions in predictable ways  [19], or pick weaker passwords due to user-fatigue [8, 20].

In this paper, we initiate the algorithmic study of password composition policies. Such policies restrict the space of passwords to a subset of allowed passwords, and force each user to pick a password in this subset. Thus, users induce a distribution over passwords where for a password , . By declaring different subsets of allowed passwords, different password composition policies induce different distributions. Our work formalizes and addresses the algorithmic problem a server administrator faces when designing a password composition policy; we ask:

In what settings can the information about the users’ preferences over passwords allow us to design a password composition policy that is guaranteed to induce a password distribution as close to uniform as possible?

We wish to stress at this point that we do not take a cryptographic approach to the problem: we do not design a protocol aimed at amplifying a password’s strength, nor do we rely on standard cryptographic assumptions or techniques in designing our password composition policies. Single-factor authentication does not defend against an attacker who learns about the most probable password from an external source. Furthermore, because password systems often allow users multiple attempts in entering their password, an attacker can make a small number of guesses with impunity. Therefore, we instead focus on the design and analysis of algorithms for optimizing the password composition policy’s induced distribution over passwords, and in our theoretical results compare the performance of our algorithm to the optimal policy among exponentially many potential policies in the worst case.

1.1 Our Model

We study the algorithmic problem of optimizing password composition policies along multiple dimensions: the goal, the user model, and the policy structure.

Goal. We focus on designing a policy that maximizes the minimum-entropy of the resulting password distribution. Specifically, we assume the server deals with users, each picking a password from some space of passwords that respects the server’s password composition policy. These passwords form a distribution over the domain of all allowed passwords and our goal is to minimize the probability of the most likely password. This is a natural goal (see Section 7), as opposed to maximizing the Shannon-entropy of the distribution, which for example is still high even if half the people choose the same password and the other half choose a password uniformly at random from . From a security standpoint, the minimum entropy represents the fraction of accounts that could be compromised in one guess. For example, an adversary would be able to crack of RockYou passwords [15] with only one guess. Alternatively, should the attacker attempt to break into only one account, the minimum entropy represents the likelihood that the account is compromised on the first guess. We also consider a slightly stronger goal of minimizing the fraction of accounts that could be compromised using guesses, that is, the overall probability of the most likely passwords [6].

User model. We consider two models for how users select passwords when presented with a password composition policy.

In the ranking model, each user has an implicit ranking over passwords, from the most preferred to the least preferred. Given a password policy, each user selects the highest-ranking password among those allowed by the policy. There is a distribution over the space of rankings that determines the fraction of users with each possible ranking. Note that for any password composition policy, such a distribution over rankings induces a distribution over the most preferred allowed passwords.

In the normalization model, there is a distribution over the space of all passwords. This distribution tells us the likelihood that an unrestricted user would select a given password. Given a password composition policy, induces a new distribution over the allowed passwords (which can be obtained by normalizing the probabilities under of the allowed passwords). When we ban a password the fraction of users that prefer each allowed password grows; the natural interpretation is that users who preferred an allowed password still use that password, but users who preferred a banned password are redistributed among the allowed passwords according to the induced distribution.

As we show, the normalization model is strictly more restrictive than the ranking model: any distribution in the normalization model can be simulated in the ranking model, but there exist hardness results for the ranking model that do not hold for the normalization model.

Policy structure. We consider the best policy that is restricted to manipulation of a given set of rules — each rule is simply a predefined subset of potential passwords. These rules are given to us as part of the problem (see Section 7 for a discussion of this point). If we interpret a rule as a subset of banned passwords (e.g., passwords shorter than seven characters), its complement (e.g., passwords of at least seven characters) can be interpreted as a subset of allowed passwords. As such, when we take the union of rules we get either a set of banned passwords (negative rules) or allowed passwords (positive rules); this is our password composition policy. While the distinction between the two cases may at first seem a mere technicality, it is in fact quite significant due to the following observation. If we ban the union of rules then in order to ban a password that was picked by too many users, we may ban any rule that contains this password. In contrast, if we allow a union of rules then in order to ban this password we must not allow any rule that contains it. In other words, when our goal is to discard a password in the negative rules setting, we have multiple ways to do so. When our goal is to discard a password in the positive rules setting, we have only one way to do so — excluding all rules that allow this password. As we shall see, this seemingly small difference leads to a clear separation between the two scenarios in terms of the complexity of designing optimal policies.

We pay special attention to the case where each password has its own singleton rule. In this setting, a policy can be interpreted as a “blacklist” of banned passwords that do not necessarily share common characteristics. Note that when each password has its own singleton rule, it does not matter whether these rules are positive or negative.

1.2 Our Results

As we noted above, a password composition policy induces a distribution over most preferred passwords (in both user models). Hence we can study algorithms that sample these distributions. One can obtain such samples by asking random users to choose a password that is constrained by a certain policy. Clearly, though, we need the number of samples to be “small”. The size of the space of all passwords  — which we denote by  — is typically very large (e.g., can include all passwords that are no longer than ASCII characters). We wish to maximize entropy using a number of samples that does not depend on .

Before tackling this goal directly, we study the problem in a simpler setting where the preferences of all users are given to us as input (i.e., there is no uncertainty). In particular, here is a part of the input and algorithms are allowed to run in time polynomial in . The computational complexity of problems in this setting informs their study in the sampling setting: it is hopeless to design efficient sampling algorithms for problems that are computationally hard, but computationally tractable problems may (or may not) have efficient sampling algorithms.

Table 1.2 summarizes our complexity results. The parameter refers to our optimization target: minimizing the likelihood of the most likely passwords. Some results are direct corollaries of others — using the fact that singleton rules are a special case of positive rules and the fact that the normalization model is a special case of the ranking model (see Section 2). Looking at the table one immediately notices a clear separation between negative rules and positive rules: optimization using the latter is much easier.

\tbl

Summary of Complexity Results. Ranking Model Normalization Model Constant Large Constant Large Singleton rules P NP-Hard (Thm 3.2) APX-Hard w/ UGC (Thm 3.2) P P (Thm 4.1) Positive rules P (Thm 3.1 ) NP-Hard P NP-Hard (Thm 4.3) Negative rules -approx is NP-hard (Thm 3.3) NP-Hard NP-Hard (Thm 4.2) NP-Hard

We therefore focus on positive rules in our attempt to design an efficient sampling algorithm. Our main result is the best one could hope for in this setting. We design an algorithm that works in the more general ranking model, and finds a policy whose entropy is -close to optimal with probability , for any given . The required number of samples is polynomial in , , and the number of positive rules . We can assume that is small, because each rule corresponds to a subset of passwords that can be concisely described to users.

These results can be applied in a practical setting, and we show this through simulated sampling experiments using natural rules and a large dataset of real passwords. The experimental results provide evidence for the difficulty of the negative rules setting: we search all combinations of rules to find the optimal policy and then attempt to discover this policy by making decisions both randomly and with a heuristic. In the negative rules setting, neither approach succeeded at finding the optimal policy after hundreds of iterations at various sample sizes, and average-case performance did not improve with sample size. In the positive rules setting, the average-case performance of our efficient algorithm improved with sample size and, with a moderate sample size, found policies that were either optimal or very close to optimal.

1.3 Related Work

It has been repeatedly demonstrated that users tend to select easily guessable passwords [15, 11, 4] and NIST recommends that organizations “should also ensure that other trivial passwords cannot be set,” to thwart potential attackers [23]. Unfortunately, this task is more difficult than it might appear at first. Policies were initially developed without empirical data to support them, since such data was not available to policy designers [7]. When hackers leaked the RockYou dataset to the Internet, both researchers (and attackers) suddenly had access to password data, leading to many insights into true passwords [26]. However, recent research analyzing leaked datasets from non-English speakers, notably Hebrew and Chinese-language websites, shows that trivial password choices can vary between contexts, making a simple blacklist approach ineffective [5]. This means that, depending on the context, a policy based on leaked password data might provide no security guarantee, and it has ethical issues as well.

To combat this issue, researchers have turned to a sampling approach. Bonneau 2012 added a system for sampling to the Yahoo! password infrastructure. This system allows one to gain empirical data about the frequency distribution of passwords without revealing the passwords themselves. Such approaches provide a way of gathering empirical data about passwords while maintaining the anonymity of users. Our algorithms could be used in conjunction with such an infrastructure to optimize policies.

Komanduri et al. 2011 studied the effectiveness of several basic password composition policies by using Amazon’s Mechanical Turk to conduct a large scale user study. They found that people often respond to restrictions in predictable ways (e.g., if the password needs to contain a capital letter users might tend to capitalize the first letter of a password) and provide very general recommendations for password composition policies. However, no theoretical model has been proposed for studying the password composition problem.

Schechter et al. 2010 suggest using a popularity oracle to prevent individual passwords that have been used too frequently from being selected by new users. They also proposed using the count-min sketch data structure [9] to build such a popularity oracle. Malone and Maher 2012 suggest a similar system using a Metropolis-Hastings scheme to force an approximately uniform distribution on passwords. Usability results on the effectiveness of dictionary checks [19] suggest that such policies would be very frustrating since the policy is hidden from users behind an oracle. In contrast, we seek to construct optimal policies from combinations of rules that are visible to the user and can be described in natural language.

This consideration of users is important to electronic commerce, even where security is concerned. Florencio and Herley 2010 studied the economic factors that drive institutions to adopt strict password composition policies and find that they often value the user experience over security. An e-mail provider like Yahoo! might adopt simple composition policies because a frustrated user could easily switch to Gmail, while universities are free to adopt strict policies because users cannot switch easily.

2 A Model of Password Composition Policies

We use to denote the space of all possible passwords. is used to denote the total number of passwords. We denote the number of users by .

A password composition policy may be specified in terms of rules. A rule is a subset of passwords (e.g., the set of all passwords with more than seven characters). We use to denote a list of rules that may be active or inactive. We consider two schemes.

  • Positive Rules: A password is allowed if and only if it is allowed by some active positive rule. Formally, a password composition policy is specified by a set of active rules. In this setting rules should consist of sets of passwords which we expect to be strong (e.g., might be the set of all passwords longer than 10 characters, or the set of all passwords that use both upper and lowercase letters, or the set of all passwords that do not include a dictionary word).

  • Negative Rules: A password is allowed if and only if it is not contained in any active negative rule. Formally, a solution is given by a subset of active rules. A negative rule should consist of passwords that we expect to be weak (e.g., might be the set of all passwords without an uppercase letter, or the set of all passwords shorter than 6 characters, or the set of all passwords that include a dictionary word).

We also consider the special case of singleton rules, where our rules are . Equivalently, we are allowed to ban or allow any individual password.

We use to denote the probability of a password given composition policy . For we have . Given a set we will also use . We use to denote the probability of the most popular passwords. Intuitively, represents the probability that an adversary can successfully guess a password using attempts. To avoid cumbersome notation we sometimes use to denote the probability of the most popular password. Similarly, we use (resp., ) to denote the probability of the second (resp., ’th) most popular password.

We consider two user models that determine how users choose passwords under a given password composition policy.

  • The ranking model: A ranking is simply a permutation of , which represents a user’s password preferences. It can be represented using an ordered list ; user prefers password to for all . The ranking naturally tells us which password will pick under any composition policy . Specifically, will use password where . Given a distribution over rankings, we have

  • The normalization model: Let be an initial distribution over , and let . If we select the composition policy then the probabilities of all are simply re-normalized so that

Clearly it holds for both models that the probability of an allowed password monotonically increases as one bans more passwords. Formally, for all and such that we have

(1)

Another important observation is that for our purposes the ranking model is more general than the normalization model. Indeed, we argue that a distribution over passwords in the normalization model induces an equivalent distribution over rankings. To generate the most highly ranked password, draw a password from . Next, let , and draw the next most preferred password , where with probability . In the following round we ban to obtain a policy , and so on, until all passwords have been banned.

Given , our goal is to find such that for all . When this goal is equivalent to maximizing the minimum entropy. If for all then we say that is a -approximation. To simplify notation we sometimes use -approximation instead of -approximation.

3 Ranking Model: Complexity Results

In this section we consider the complexity of finding the optimal password composition policy in the more general ranking model when the organization is given complete information about users’ preferences. Specifically, the organization is given the rankings of every user.

Our first result is for the positive rules setting. Given positive rules we show that can be computed efficiently for constant values of (see Theorem 3.1). In fact, for the special case we present a very simple algorithm that suffices. Both algorithms can be easily extended to the less general normalization model. Our algorithms are based on three simple ideas: (1) Reduced Preference Lists — each preference list can be efficiently reduced to a short (length ) preference list . (2) Guess and Check — start by guessing the ‘structure’ of the optimal solution and find the resulting solution. (3) Iterative Elimination — find the most popular password and eliminate all positive rules that contain . Our sampling algorithms are based on the same core ideas.

Unfortunately, the picture is different in the negative rules even when is a constant. Given negative rules we show that it is hard to even -approximate . Also, for non-constant values of we show that it is hard to compute in the singleton rules setting, which immediately implies hardness in both the positive rules setting and in the negative rules setting. Given a stronger complexity assumption known as the Unique Games Conjecture [17] it is also hard to -approximate in the singleton rules setting for some constant . However, our hardness results do not rule out the possibility of a -approximation for a larger constant .

3.1 Positive Rules: Efficient Algorithm for Constant

We first show that can be computed efficiently for constant values of in the positive rules setting. In this section the organization is given positive rules as well as preference lists . We assume that the organization can efficiently query the preference lists (e.g., given the organization can efficiently find  — user ’s preferred password given policy ).

We elaborate on the key algorithmic ideas listed above. First, we can efficiently reduce each preference list to a list of at most passwords (Claim 1). While the reduced list is much shorter than it is still sufficient to determine user ’s preferred password given policy for any . We use to denote the reduced space of potential passwords.

Input:
Preference List:
Positive Rules:
Initialize: , , empty ranking.
while  do
     Let be .
      ‘Append’ the current most preferred password to
      Deactivate all rules that contain
      return
Algorithm 1 Reduce
Claim \thetheorem

Algorithm 1 makes at most queries to and membership queries and outputs a reduced preference list over at most passwords such that for every it holds that .

{proof}

Clearly, the algorithm’s main loop iterates at most times because for each we eliminate at least one rule (e.g., ), so the bound on queries and the length of are immediate. (Because we assume that we can query efficiently Algorithm 1 is also efficient.) By construction we have for each . Fix any . Let be such that yet and let be the most preferred word in out of all words in . If it is the case that , then is the most preferred word in too and we’re done. Otherwise, which means that removing the set creates a set s.t. , contradiction.

Second, the “guess and check” idea means that our algorithm starts by guessing what the optimal solution looks like (e.g., what the most popular passwords will be in the optimal solution and what the probability of the ’th most popular password is). There are at most potential solutions to brute-force try. As we show, for each solution, it is easy to figure out which sets must be eliminated.

Input:
Preference Lists:
Positive Rules:
Integer
Initialize: Candidate Solutions
for  do
     
. Reduced Password Space
for all  with s.t. and  do
     
     while  and s.t  do
          Ban because it is inconsistent with guess      
     if  for all  then
               return
Algorithm 2 GuessAndCheck
{theorem}

Algorithm 2 runs in time polynomial in , and outputs a set of positive rules of positive rules such that

for every other set . {proof} It is evident that the running time of the algorithm is since we only have potential solutions to try.

Let denote an optimal solution and let denote the most popular passwords in this solution. Suppose we start with the correct guess ( and is the probability of the ’th most popular password), then we claim that our algorithm must produce the optimal solution. In particular, we maintain the invariant that until we converge to the optimal solution. Clearly, this is true initially — before we have eliminated any passwords.

Suppose that the invariant holds and that our algorithm bans a password by deactivating all rules in that contain . Then by the definition of our algorithm we must have . If then by Equation (1) we have

which contradicts the choice of . Therefore , so all rules that contain it are deactivated in and the invariant still holds. By definition Algorithm 2 terminates when every password has probability at most . Because our invariant still holds we can apply Equation (1) again to get

Hence, is an optimal solution.

For the special case the simple algorithm IterativeElimination (Algorithm 3) suffices. The basic idea is very simple: iteratively eliminate the most popular password by deactivating all positive rules that contain . We repeat this process until no passwords remain. We claim that one of the solutions along the way was the optimal solution.

Input:
Preference Lists:
Positive Rules:
Initialize: ,
while  do
      is most popular allowed pwd
      Deactivate all rules that contain
      return where
Algorithm 3 IterativeElimination
{theorem}

Algorithm 3 outputs a set of positive rules such that

{proof}

Let denote the optimal policy. Clearly if then our algorithm returns because that is the first set we try. Otherwise, . Let be the last set our algorithm considers that has the property that . Again, if , our algorithm returns . Let be the most popular word in , and because of optimality .

Now, because we modify to not contain in the next iteration, then the most popular word in , has to belong to some rule where . Therefore , and by the definition, the most popular word in satisfies .

But observe, because , we must have that is at least as popular in . Indeed, if is a preference list where we disallowed and the most preferred word is , then as long as we disallow more words but keep allowing the word remains at the top of the list. Therefore, . Combining together all inequalities we get , which means our algorithm returns .

3.2 Singleton Rules: Hardness for Large

Now we turn our attention to the problem of optimizing for large values of . Theorem 3.2 says that unless no polynomial time algorithm can compute even with singleton rules. If we are willing to make the Unique Games Conjecture (UGC) [17] then it is hard to even -approximate for some constant . These results immediately imply hardness in both the positive and negative rules setting because these settings are a generalization of the singleton rules setting.

{theorem}

Unless there is no -algorithm that gets as input an arbitrary set of preference-lists over and an integer , and outputs the optimal in the singleton rules setting. {proof} We prove the theorem using a reduction from the Vertex-Cover problem. Given a graph over vertices and edges and an integer , we first define

and observe that . We also construct the following preference-lists, where for every edge we have the two lists:

where the choice of passwords below position is arbitrary, but both rankings must be identical from position onwards. Finally, we set .

Given a policy , we denote all banned words as . We denote by as the set of words that at least one user ranks first after banning all words in . Observe, . Using this notation, we show this reduction indeed proves -hardness.

First, suppose has a vertex cover of size . Then by banning all passwords we now have , because for every either or are banned, so the word appears at the top of at least one of the two lists . Therefore, the preference-lists induce a distribution whose support contains words, thus .

Conversely, suppose all vertex covers of are of size at least . Let be any set of banned words. Clearly, if then the distribution induced by the preferences-lists has support of size at most , which means that . Otherwise, , and we denote the set of vertices . Observe, since any vertex cover of must contain vertices, then there has to be at least edges that does not cover (since we can always complete to a vertex cover by adding one vertex from each uncovered edge). Therefore, there have to be at least words that do not appear at the top of any preference list. We conclude that the distribution induced by the preference-lists has a support of size at most

thus .

From the same reduction described in Theorem 3.2 we get -hardness of approximation. While there are sub-exponential time algorithms to solve the Unique Games problem [2], there are no known polynomial time algorithms. Many famous approximation hardness results are based on the Unique Games Conjecture (e.g., hardness for vertex cover [18]). Our reduction relies on a result in [3], which says that vertex cover is hard to approximate up to a (say) -factor even on bounded degree graphs. Because we start with a bounded degree graph we can argue that each password in our reduction appears at the top of at most preference-lists for some constant . See the appendix for a formal proof. {theorem} There exists a constant such that it is -hard for a -time algorithm to -approximate the optimal in the singleton rules setting and the rankings model.

3.3 Negative Rules: Hardness of Approximation for

We next turn to negative rules, where we show that the problem is extremely difficult even for . Though the proof appears in the appendix, it is quite interesting and we encourage the reader to take a look.

{theorem}

Let . Unless there is no polynomial time algorithm (in ) that approximates to a factor of in the negative rules setting and the rankings model.

4 Normalization Model: Complexity Results

In this section we focus on complexity results for the normalization model. Here the structure of the input to our problem is a bit different: For each password we are given the probability that is selected by a random user when . Note that now we can give the distribution explicitly because it requires numbers (whereas a distribution over rankings requires numbers). This distribution induces a distribution over for any password composition policy by normalizing probabilities, as explained in Section 2.

Because the normalization model is a special case of the ranking model our algorithms for the ranking model can also be applied in the normalization model. The question is whether or not the hardness results carry over.

We first consider the singleton rules setting with large , and show that that we can compute in polynomial time in (Theorem 4.1). This result separates the normalization model from the ranking model (e.g., compare Theorems 4.1 and 3.2). However, it does not extend to the positive rules setting. In fact, we show that optimizing is NP-Hard when is a parameter (Theorem 4.3).

With negative rules we show that it is hard to -approximate (Theorem 4.2). However, we cannot rule out the possibility of an efficient -approximation algorithm for some constant in the normalization model (recall that Theorem 3.3 ruled out the possibility of a -approximation algorithm in the ranking model for any ).

4.1 Singleton Rules: Efficient Algorithm for large

We present SortAndOptimize — an efficient algorithm to optimize in the singleton rules setting for any value of . The key intuition behind our algorithm is that if is the most likely password then will remain the most likely allowed password unless we ban it — a property that does not hold in the rankings model. A formal proof of Theorem 4.1 can be found in the appendix. {theorem} For every , Algorithm 4 computes in the singleton rules setting of the normalized probabilities model, in time .

Input:
Password space and a probability distribution over .
Integer .
Sort the words in from highest to lowest probability, .
return the set , where minimizes the ratio
Algorithm 4 SortAndOptimize

4.2 Negative Rules: Hardness for

We next prove an inapproximability result that is somewhat weaker than the one that we obtained for the more general ranking model.

{theorem}

There exists some constant such that unless no polynomial time algorithm (in ) can -approximate in the negative rules setting and the normalization model.

We will require the following construction; the proof is given in the appendix.

{lemma}

Fix and such that . There exists a domain of size and a family of sets, , such that each set in the family contains elements, and for every of size , we have that the size of the union . This domain can be constructed in randomized time.

That is, each set in this family contains exactly the same fraction of the domain, and furthermore — any union of sets has the property that its cardinality is proportional to .

{proof}

[of Theorem 4.2] We reduce from Set-Cover — one of the classic -Complete problems [16]. We are given sets , universe , and an integer , and we are asked whether there is a set of size such that .

It is a known fact that there exist Set-Cover instances, with all polynomially dependent of each other, that are hard to approximate to a factor of  [1]. That is, on this particular family of instances, it is -hard to distinguish whether there exists a cover of size or all covers have size .

We now describe the reduction. Given a -Set Cover instance, we set and construct a domain and sets as in Lemma 4.2. We then create the following password-banning instance. First is the union of with additional disjoint words denoted . Now, for each set in the Set-Cover we add a rule where . Finally, we set the words’ probabilities as follows. Fixing some arbitrarily small , we set for every the probability , and for every we set the probability .

Without loss of generality we can assume that (because, for example, we can take copies of the original ). Therefore, any policy that bans all of yet leaves a constant (say ) fraction of has , whereas any policy that keeps even one of the words in has . Therefore, if the Set-Cover instance has a cover of size , then a -approximation of the optimal banning-policy must find a cover for . We will assume from now on that our Set-Cover instance is such that it has a cover of size . (Indeed, if then the instance is no longer -hard, since the greedy algorithm must return a cover of size which causes us to deduce that the optimal cover must have size .)

So now, suppose our Set-Cover instance has a cover of size . Then the respective union of rules bans every password in and no more than words of (we get an upper bound by multiplying the size of each set by the number of sets). This leaves a collection of equally likely words, so . In contrast, if all covers of our Set-Cover instance have size (where, because we assume some cover has size , we have ,) then any collection of rules that bans all words in must also ban at least words out of . This leaves at most words in and so . Denoting the latter constant as , we have that any approximation of the optimal banning-policy indicates the existence of a cover of cardinality .

4.3 Positive Rules: Hardness of Approximation for Large

While we can show that it is possible to optimize in the singleton rules setting our result does not extend to the more general positive rules setting. We are able to show that it is NP-Hard to compute . However, our reduction does not imply approximation hardness so we cannot rule out the existence of a PTAS.

{theorem}

Unless there is no polynomial time algorithm (in ) which outputs in the positive rules setting and the normalization model.

The theorem’s proof is relegated to the appendix.

5 Efficient Sampling Algorithms

In a sense, our complexity results are not “realistic”, and in particular in the ranking model our positive algorithmic results assume access to each user’s full preferences. Moreover, some algorithms are allowed to run in polynomial time in the number of passwords , which can be huge. In this section we use our complexity results as guidelines in the design of practical sampling algorithms.

In more detail, we are given oracle access to rules (e.g., we can ask whether or not a password ) and we are allowed to sample from the distribution induced by the password composition policy for any . Less formally, a sample is equivalent to asking a random user what her favorite password is given the current policy.

We will work in the more general ranking model, so there is essentially only one positive result we can build on: Theorem 3.1, a polynomial time algorithm for constant in the positive rules setting. When adapting this algorithm to the sampling setting, we cannot expect it to work perfectly due to the inherent uncertainty of this domain. Instead we expect the algorithm to find an -optimal password composition policy with probability at least , for any given and . Crucially, the number of samples must not depend on the number of passwords , and must have a polynomial dependence on the other parameters.

Formally, we let denote the optimal collection of positive rules to activate (for all , ). Our goal is to find a -approximation to , that is, such that , with probability .

We first present Algorithm 5 that achieves our goal for ; this algorithm is an adaptation of Algorithm 3.

Positive Rules:
Input: ,
Initialize:
while  do
     Sample: Draw samples according to the distribution
     
      for each .
      is the most frequently sampled password
      is our estimation of
     if  then return The current solution is already sufficiently good
     else
          Deactivate all rules that contain
               return where
Algorithm 5 SampleAndEliminate
{theorem}

Algorithm 5 runs in polynomial time in , requires samples and returns a -approximation of with probability at least .

{proof}

Let

denote the event that our probability estimates are off during iteration . Claim 5 bounds the probability of any bad event. The proof of Claim 5 can be found in the appendix. The proof involves bucketing the passwords based on their probability, applying Chernoff Bounds to upper bound the probability of a bad estimate for our passwords in each bucket, and repeatedly applying union bounds.

Claim \thetheorem

For the rest of the analysis we assume that no bad event occurs. Let and suppose that . Clearly, this is true when . If then so that . Hence, and the property is maintained for at least one more iteration. If instead then we have so for each we have . We conclude that the solution is a -approximation.

We next explain how to extend Algorithm 2 to -approximate the optimal for any constant .

{theorem}

There is an algorithm which runs in polynomial time (in , ), takes a polynomial number of samples, and returns a -approximation of with probability at least .

{proof}

[sketch] To extend Algorithm 2 to -approximate for constant we need one more idea. We cannot simply obtain a reduced password space by reducing preference lists because we can only sample from our distribution. Notice that for any such that we have so to obtain a -approximation it is sufficient to limit our attention to passwords in the following set

We can obtain a superset of by sampling. For each positive rule we draw independent samples from the distribution and set

Intuitively, a password is included in if and only if our estimated probability is sufficiently large. Let . For a sufficiently large sample size we can apply Chernoff Bounds to argue that with probability (1) is small, i.e., , and (2) .

6 Experiments

To demonstrate how our ideas could apply in a real-world scenario, we simulated runs of Algorithm 5 by sampling with replacement from the RockYou leaked password set [15]. The set contains over 32 million passwords with a frequency distribution similar to that of many other password sets [4]. Note that all results presented here are limited by the dataset and assume the normalization model. Working in the normalization model is crucial because we cannot ask the RockYou users for their preferred password under a specific policy; an initial distribution over  — which is available to us — is sufficient though, because it induces a distribution for any policy .

We selected 21 positive rules that mirror commonly used password composition rules that are used in practice, and looked at sample sizes of 100, 500, 1000, 5000, and 10000. The rules included length requirements, character class requirements, combinations of requirements, a dictionary check, etc. (See Appendix C for a complete listing of the rules we selected.) For each run with a particular value of , the algorithm returns a policy for which we can measure in the original dataset and compare with the optimal , determined from running Algorithm 3 on the original dataset. We performed 500 runs for each of the five values of .

To gain an understanding of how policies based on negative rules perform, we took the complement of the 21 positive rules selected above to get 21 negative rules. We then determined the optimal negative rules policy by calculating via brute-force. This was required because we have no equivalent to Algorithm 3 for negative rules. With this baseline in hand, we designed two naïve algorithms, similar in spirit to Algorithm 5. There are multiple ways to discard a password in the negative rules setting, and one algorithm makes this decision randomly while the other bans the smallest subset as determined from the current sample. Again, 500 runs were performed for each .

6.1 Baselines

\tbl

Baseline probabilities for the RockYou dataset Baseline Mean across negative rules policies 1.3 Mean across positive rules policies 1.0 All passwords allowed (no policy) 9.2 One positive rule () 6.8 8 chars, 1 upper, 1 digit Optimal policy with positive rules 4.4 14 chars OR 2 symbols OR 8 chars, 1 upper, 1 digit Optimal policy with negative rules 1.4 10 chars AND 2 digits AND 1 symbol AND 1 lowercase AND not in dictionary

\tbl

Performance of Sampling Algorithms with Positive Rules Sample Size mean min % Optimal 100 6.8 1.2 500 9.7 2% 1000 9.5 10% 5000 6.0 14% 10000 5.7 19%

\tbl

Performance of Sampling Algortihms with Negative Rules Random Decision Ban Smallest Sample Size mean min mean min 100 6.8 1.2 7.2 2.3 500 4.4 6.3 9.0 2.3 1000 4.3 4.5 8.6 2.3 5000 6.3 4.5 9.2 9.2 10000 7.2 4.5 9.2 9.2

We examined several baselines for comparison with our algorithm. Table 6.1 shows these baselines, the probability of the most frequent password in the resulting policy, and the optimal policy as a union or intersection of rules (for clarity, the complement of the union of negative rules is shown as the intersection of positive rules).

As shown in Table 6.1 from the means across policies, randomly selecting a policy from the power set of rules can be worse than having no policy. The “one rule maximum” baseline was selected because, if decided based on sampling, only distributions need be sampled. Our efficient algorithm requires the same amount of sampling, but can find the optimal policy over rather than . Also of interest is the optimal policy with negative rules, which is over 3x better than the optimal policy with positive rules. However, as shown in the following section, the performance of our sampling algorithms with negative rules was far worse than in the positive rules setting.

6.2 Performance

In the positive rules setting (see Table 6.1), the algorithm performed extremely well even at moderate sample sizes. The average policy selected with was almost 10x better than having no policy. At , the optimal policy was found 10% of the time (50 out of 500 times).

In the negative rules setting (see Table 6.1), however, neither algorithm found the optimal policy. The “Ban Smallest” heuristic, when faced with a choice between multiple subsets that contain the most likely password, decides to ban the smallest available subset, disrupting the space the least. This might seem like an intuitively good choice but, in fact, it fails to find a better policy than the empty set at large sample sizes. The randomized algorithm does better (it cannot actually do worse) but still has much worse average case performance than using our efficient algorithm with positive rules.

7 Discussion

We conclude by discussing some key points.

Where do the rules comes from? Throughout the paper we have assumed that the rules (whether positive or negative) are given as part of the input; it is not up to us to find these rules. Our experiments indicate that a collection of intuitive and practical rules can already give very good results on real data. However, the question of deciding which rules should be added to our collection is outside the scope of this paper. Much like the problem of feature selection, it is an interesting problem with real-life implications, which we suspect will be very difficult in practice.

Alternate policy goals. Our goal [6] has been to minimize . Intuitively, represents the probability that an adversary with no background knowledge can successfully guess the password of a randomly selected user in tries. A small value of optimizes security guarantees against an online guessing attack in which the adversary is locked out after failed attempts to login. A much larger value of (e.g., ) is necessary to optimize security against an adversary who has obtained the cryptographic hash of a password and is able to mount a brute-force dictionary attack [25]. However, the optimal solutions for and might be completely different. One stronger goal that we might hope to achieve is to optimize both goals simultaneously. More formally, can we find a policy such that for every and every we have for some constant ? Unfortunately, the answer is no. For any constant this universal approximation goal is impossible to satisfy in the ranking model (see Theorem B).

Other natural goals include -work factor [22] and a refinement called -guesswork [4] (e.g., maximize the total number of guesses needed to compromise -fraction of the accounts). While -guesswork is an useful metric to analyze the security of 70 million Yahoo passwords [4], it may not be a desirable optimization goal for the organization because it might allow the adversary to crack up to -fraction of the accounts with relatively few guesses.

Another interesting direction is to account for an adversary with basic background information about the user (e.g., e-mail address, username, birthday). It may not always be realistic to assume that the adversary has no background knowledge because the adversary can often easily obtain some background knowledge about a user by searching for publicly available information on the internet. One approach might be to design a rule to specify different passwords for different users (e.g., the set of passwords that contain the username or birthday of the user).

Open Questions. While we were able to prove several hardness results about finding the optimal password composition policy in the negative rules setting, it is possible that these hardness results could be circumvented by making mild (hopefully realistic) assumptions about the underlying password distribution or the rules . Are there efficient algorithms to optimize in the negative rules setting given realistic assumptions? It is also possible that mild realistic assumptions could be used to circumvent the impossibility result of Theorem B, and design a universal approximation algorithm.

There are also several interesting technical questions that remain open:

  1. Normalization model with negative rules: Can we efficiently -approximate