General Framework for Evaluating Password Complexity and Strength^{†}^{†}thanks: This work is sponsored by the Department of the Air Force under Air Force Contract FA872105C0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.
Abstract
Although it is common for users to select bad passwords that can be easily cracked by attackers and many decades have been invested in researching alternative authentication methods, passwordbased authentication remains arguably the most widelyused method of authentication today. Even if password use ever becomes negligible in the future, passwordbased authentication is currently so pervasive that the transition to new authentication methods is likely be very long and complicated. Until that happens, to encourage users to select good passwords, enterprises often enforce policies, for example, by requiring passwords to meet minimal length and contain special characters. Such policies have been proven to be ineffectual in practice, and despite all the available tools and related ongoing research, stolen user credentials are often cracked by attackers before victims get a chance to react properly. Also, researchers and practitioners often use the notions of password complexity and strength interchangeably, which only adds to the confusion users may have with respect to password selection. Accurate assessment of a password’s resistance to cracking attacks is still an unsolved problem, and our work addresses this challenge. Although the best way to determine how difficult it may be to crack a userselected password is to check its resistance to cracking attacks employed by attackers in the wild, implementing such a strategy at an enterprise would be infeasible in practice. In this report we, first, formalize the concepts of password complexity and strength with concrete definitions which emphasize their differences. Our definitions are quite general. They capture human biases and many known techniques attackers use to recover stolen credentials in real life, such as bruteforce attacks, mangled wordlist attacks, as well as attacks that make use of Probabilistic Context Free Grammars, Markov Models, and Natural Language Processing. Building on our definitions, we develop a general framework for calculating password complexity and strength that could be used in practice. Our approach is based on the key insight that an attacker’s success at cracking a password must be defined by its available computational resources, time, function used to store that password, as well as the topology that bounds that attacker’s search space based on that attacker’s available inputs (prior knowledge), transformations it can use to tweak and explore its inputs, and the path of exploration which can be based on the attacker’s perceived probability of success. We also provide a general framework for assessing the accuracy of password complexity and strength estimators that can be used to compare other tools available in the wild. Finally, we discuss how our framework can be used to assess procedures that rely on storing passwordprotected information.
1 Introduction
Although many ideas have been proposed to replace passwords, they are still considered to be the standard authentication mechanism for such services as email, social networking, etc. However, passwordbased authentication has been a notable weak point in cyber security despite decades of effort. For example, in 2012, of network intrusions exploited weak or stolen credentials (i.e., username and/or password) [1]. Researchers and practitioners agree that having good passwords is critical in many applications, but users often choose bad passwords [2, 3]. A good password should have two key properties: (i) difficult to guess by an adversary and (ii) easy to remember; users almost always opt for the latter rather than the former [4].
So many services currently rely on passwordbased authentication that even if password use were to ever become uncommon, the transition to new authentication methods is expected to be long and complicated. And so, passwords have been the focal point of many studies in recent years. These studies have explored a range of related topics including password cracking algorithms [5, 6, 7, 8, 9, 10, 11], password strength and complexity [12, 13, 14, 15, 16, 17], user behavior with respect to password selection [18, 19, 20], and password creation policies [3, 2, 21, 22, 23]. In this report we focus on accurate assessment of a password’s resistance to cracking attacks, a problem that we believe still remains unsolved.
As suggested in [24], the first step in any security analysis is to define our goals and our considered threat model. Thus, in this report we first formally define password complexity and password strength. In our definitions, we consider an attacker whose goal is to recover a password that has been hidden by a particular protection function.
Informally, password complexity defines the usage of allowed characters, length, and symmetry of a password. However, in real life, attacker’s success is limited by its computational resources, time, and prior knowledge as well as how the password is stored. Password complexity does not take such details into account, so in principle it cannot provide an accurate estimate of how long it may take an attacker to crack a password. Note, however, that password complexity is still a good indicator of how difficult it may be to guess a password (i.e., how close it is to a random string) when information about how the password is protected and/or attacker’s capabilities is not readily available. Password strength on the other hand, does take such details into account, and, thus, it is a more complete notion. It may, however, be very difficult to estimate in practice because it may be impossible to accurately capture changes of such parameters as technological advances and any additional auxiliary information available to the adversary with time.
Both password complexity and strength require understanding of attacker’s use of prior knowledge, which we express in the form of a topology that bounds the attacker’s search space. A topology is defined by the attacker’s knowledge about the alphabet used to create the password, rules that it can use to tweak and explore words created by that alphabet, and the exploration path of the resultant search space. The latter can be based on the attacker’s perceived probability of success.
Our definitions are general as they capture human biases and many known techniques attackers use to recover stolen credentials in real life, such as bruteforce attacks, mangled wordlist attacks, as well as attacks that make use of Probabilistic Context Free Grammars (PCFG), Markov Models, and Natural Language Processing (NLP).
Using our definitions we develop a general framework for calculating password complexity and strength that could be used in practice. We believe our framework provides a complete sense of security due to its extensive consideration of how attackers crack passwords in the wild. To summarize, the contributions of our study are as follows.

We formalize the concepts of password complexity and password strength.

We propose a novel complexity measure that models current password attacks which leverage password “topologies,” i.e. dictionaries of words together with wordmangling rules and a specification of the order in which rules are executed during an attack.

We provide a framework for empirical evaluation of passwordstrength and passwordcomplexity estimators.

Finally, we discuss how our framework can be used in general to assess any procedure that relis on storing passwordprotected information.
The rest of this report is organized as follows. We introduce important notation and definitions which are used throughout the paper in Section 2. Section 3 formally defines notions of password complexity and password strength, and then discusses their key differences and implications. In Section 4 we present the details of our rulebased approach to calculate password complexity and strength, and we conclude in Section 5.
2 Preliminaries
In this section we introduce important notation and definitions that will be used throughout the paper.
We define a finite alphabet to be a finite set of characters, and a password to be a finite string over . We define the profinite set of all finite strings over as the set of all possible passwords over . Note that . We next define a passwordgenerating procedure that we call a rule.
Definition 1 (Rule).
A rule, denoted by , is a function that takes as input a finite alphabet together with a finite bit string and outputs a subset of .
Here, is any auxiliary information that can be used to describe passwordpolicy requirements, e.g., password minimum and maximum length, usage of capital letters and numbers, etc.
We can view as a logical formula specifying the requirements that users have to satisfy when selecting a password. It follows that when at least two rules, , are combined to produce a new rule, , must be interpreted as (i.e., the resulting should satisfy the requirements corresponding to and and all other ’s in this combination.)
Definition 2 (Rule Set).
A rule set with respect to a rule , denoted by , is the combination of infinitely countable passwords defined by a rule, .
Definition 3 (Combination of Rules).
The combination of any finite set of rules over some finite alphabets is the union of the outputs of those rules for any auxiliary inputs .
It is important to emphasize that the union of rules may consist of a single rule. Note that characters of do not have to come from . For simplicity, we require that does not specify use of characters not in . The simplest example of a use of a rule is to generate all possible passwords, or all possible English dictionary words with a certain maximal length as defined in .
Definition 4 (Permutation of Rules).
A permutation of any finite set of rules over some finite alphabets outputs a directed graph in which the edges impose a total ordering on the vertices for any auxiliary inputs .
Note that we use the permutation of rules and topology interchangeably.
Definition 5 (Generatable).
A finite string is generatable by a union of rules , if there exist alphabets and auxiliary inputs such that .
Now, we define a password parsing, which is a partitioning of a password into segments.
Definition 6 (Parsing).
A parsing of a finite string is a partition of its constituent characters in .
We refer to the set of all parsings of a password as .
Definition 7 (Parsing Function).
Parsing function conforms a union of rules on a password and returns a list of parsings of .
Note that if there is no predefined rule, generates all possible parsings of .
Now, we define a protection function that can be used to transform and/or store a password as a string such that the original password may be more difficult for an attacker to recover.
Definition 8 (Protection Function).
Protection function F ^{1}^{1}1We will drop the subscript when it is clear which alphabet is being considered. is a function that takes a finite string over and outputs a bit string.
Definition 9 (Adversary).
An adversary ^{2}^{2}2We use an adversary and an attacker interchangeably. is defined as a nondeterministic algorithm.
We use is negligible in a parameter if such that , .
We denote a probability distribution with . A specific event in the distribution is shown as (i.e., .) The probability that an event takes a specific value, such that . The sum of over all possible values of is , , where represents the event index and denotes the total number of possible events in . We assume that every event in , , is equally probable (i.e., is uniformly distributed.) denotes the cardinality of .
2.1 Example
In order to illustrate the above definitions, we provide the following example. Suppose we define an alphanumeric alphabet, , with a password, as a finite string over . We define three rules, , , and where is a bit string that specifies passwords consisting of English dictionary words with maximum length of 8 characters, is a bit string that specifies passwords consisting of numeric characters with a maximum length of , and specifies any alphanumeric string of length , where .
Rule set is a subset of that includes to character dictionary words , e.g. hello, Goodbye, etc. Rule set is a subset of that includes to character strings of digits, e.g. 0011, 555, etc. Rule set is a subset of that includes to character strings of letters or digits, e.g. a1b2c3, zzz3, etc.
A combination of the above three rules is the union of rule outputs, and includes passwords from all three rule sets above. A permutation of gives a directed graph, in which the edges impose an ordering of the rules, e.g. . The password, , is generatable by rule combination, , because . Note that password, , is not generatable by as it is not an element of the union of rule sets .
An example parsing of is . A parsing function, , conforms the rule combination, on to produce a list of parsings . Example parsings from include , , etc.
A protection function can be any oneway function that inputs a string (password) and outputs a bit string from which it is difficult to recover the original input string. Example protection functions include common hash functions such as MD5 or SHA1. Finally, an adversary is represented as a nondeterministic password guessing algorithm, e.g. a guessing algorithm which tries dictionary words up to 8 characters in length at random and, upon exhausting all such words, tries random numbers between and .
3 Password Complexity and Strength
In this section we formally define notions of password complexity and password strength, and we then discuss their key differences and implications.
3.1 Defining Password Complexity
Recall that a password is just a finite string of characters that come from some particular finite alphabet . We define complexity of a given password over some alphabet in the context of a set of rules.
Definition 10 (Complexity).
Complexity of a password , over some alphabet , with respect to a finite set of rules is defined as the size of the smallest subset of containing that can be generated with any combination of rules in over , with any auxiliary inputs. If no combination of rules in can generate a set that contains , then ’s complexity is the cardinality of .
Notice that this definition requires specification of an alphabet and rules. This is done to capture the question of how hard it may be for an attacker to guess a password with its own set of rules and dictionaries, knowing the password policy requirements used to generate that password. Previous entropybased password complexity measures were not adequate because they did not provide the means for specifying the appropriate password search space based on precisely this kind of information, i.e., rules and dictionaries that attackers may be using. Def. 10 also captures the scenario when the attacker has no information about password policies and cannot generate the password with any of its rules and dictionaries, in which case this password may be as good as a random finite string.
3.2 Defining Password Strength
We define password strength with respect to the following security experiment involving an adversary .
Definition 11 (FATSecurity Experiment).
The inputs to FATexperiment are an alphabet , a protection function associated with , the description of adversary A, a password over , and a time period . Description of an adversary includes all of its computational resources, its rules, and any auxiliary information. takes as input , , , , and any additional random input. The security experiment ends either after outputs a finite string or the time after the experiment starts exceeds , which ever comes first. (, , , , ) returns one if, within time after its start outputs a finite string such that = . (, , , , ) returns zero otherwise.
We now define our security definition with respect to the FATexperiment we just described.
Definition 12 (Password FATStrength).
We say that a password over an alphabet is FATsecure if over all random inputs to , (, F, A, T, ) returns 0 in expectation.
Note that by providing an adversary with any auxiliary information this definition captures an attacker’s potential knowledge of the policies under which the input password was selected as well as how it could steal a multitude of additional protected passwords. The key aspect of this definition is that it requires us to consider attacker’s capabilities as well as the protection function used to store the password. This is something that has not been captured by any previous password strength or complexity measures, with the exception of passfault[17].
passfault is TimeToCrack (ttc) estimator that takes into account attacker’s capabilities including rules and password protection function. However, it is dependent on a fixed set of rules and a fixed methodology for parsing passwords. Also, it does not capture attacker’s order of rule application, nor does it take into account advances in technology and any additional auxiliary information. We describe how to address these shortcomings later in the paper.
Note that password complexity does not take into account how the password is stored, nor attackers’ capabilities. Thus, it cannot intrinsically provide an estimate of how long it may take anyone to crack a password. However, it is a good indicator of how difficult it may be to guess a password (i.e., how close it is to a random string) when information about protection function or attacker’s capabilities is not clear (the most typical scenario when users are asked to select a password for a particular website).
Password strength on the other hand, does take such details into account, and, thus, it it is more complete. However, password strength may be very difficult to estimate realistically because it may be impossible to accurately capture changes of such parameters as technological advances and any additional auxiliary information available to the adversary with time. Any such extra information can in principle be encapsulated as auxiliary information within the description of the adversary, and we propose how this can be done later in the paper.
3.3 Evaluating Password Strength Estimators
To truly evaluate a passwordstrength estimator one must compare its estimates with respect to real passwordcracking attacks. However, this may be infeasible in practice due to lack of proper equipment and time.
Although, for the purposes of empirical evaluation, wellknown passwordcracking tools such as John the Ripper (JtR) and Hashcat [25] can in principle be run on commodity hardware [26], their performance will not do justice to attackers’ capabilities in the wild.
Even when appropriate passwordcracking hardware [27] is available, there may not be enough time. For example, it is impractical to wait for a year to see if a password may really require that long to be recovered. In that time attackers’ capabilities are likely to improve, and the passwordstrength estimator under test is likely to undergo significant updates. To address the timing issue, one could focus on passwords that cannot be broken within a smaller, more practical amount of time (e.g., 4 weeks). In this context we consider the following two main criteria for evaluating passwordstrength estimators:

Reliability The estimator does not create a false sense security in the sense that it marks weak passwords as strong.

Inclusion The estimator does not reduce the space of passwords considered to be strong by marking strong passwords as weak.
Intuitively, we do not want an estimator to overestimate or underestimate password strength. We now present definitions that make up a framework for evaluating passwordstrength estimators.
Definition 13 (FATStrength Estimator).
A password FATstrength estimator is a function that takes as input alphabet , a protection function associated with , the description of adversary A, a password over , a time period , and any additional information aux, and outputs

1, in which case we say that marks as FATsecure, or

0, in which case we say that marks as not FATsecure.
Definition 14 (Password FATStrength Estimator Reliability).
We say that a password FATstrength estimator over an alphabet is reliable over a test set , if the fraction of that marks as FATsecure that are not FATsecure is negligible in .
Definition 15 (Password FATStrength Estimator Inclusion).
We say that a password FATstrength estimator over an alphabet is inclusive over a test set , if the fraction of that marks as not FATsecure that are FATsecure is negligible in .
Definition 16 (Password FATStrength Estimator Accuracy).
We say that a password FATstrength estimator over an alphabet is accurate over a test set , if it is both reliable and inclusive with respect to .
3.4 Discussions:
We propose a general framework using all possible types of password cracking attacks to calculate a password’s complexity and strength. In this section, we describe a couple of examples how our framework can be used to capture different types of password cracking attacks.
3.4.1 Probabilistic contextfree grammar:
In this rule, an attacker uses a large set of passwords from major password breaches to train his password generation model [7]. The attacker then uses the trained model to create a rule set that is used to generate a contextfree grammar strings to crack a password. Our framework imitates the attacker’s password cracking strategy during the calculation of the password’s complexity and strength.
3.4.2 Password cracking informed by online presence:
In this rule, an attacker scrapes the social network websites (e.g., Facebook, LinkedIn, etc.) of a user to extract possible phrases from structured/unstructured text, pictures, videos etc. which can be used to derive a password. These phrases can be combined with possible rule sets (e.g., word list) to create a combination of rules. The attacker probably knows the alphabet associated with the password that the attacker is trying to crack. If he does not know the alphabet, he can try different alphabets or all ascii characters. Finally, the attacker can explore all possible passwords by using different dictionaries (e.g., if the user’s Facebook page has posts in English and French, the attacker can use these both dictionaries.) Our framework mimics the attacker’s password cracking strategy to calculate the password’s complexity and strength.
4 Our Rulebased Approach
In this section, we present the details of our rulebased approach which uses the combinations of upper bound, lower bound, chain rule, and orderaware chain rule to calculate the complexity and strength of a password, . Unlike other schemes [16] [13] [14], our framework provides more complete sense of security for a user while creating a password in .
4.1 Complexity and Strength Calculation Framework
General architecture of our rulebased password complexity and strength framework is shown in Fig. 1. It takes a password as an input. A user can also provide a subset or entire , , the minimum and maximum allowable length of , strength parameters (e.g., password storage and adversarial capabilities), and order of rules. It outputs , strength estimate (i.e., or ), and normalized complexity of in relation to lower and upper bounds of complexity. Note that all inputs (orange boxes in the Fig. 1) are optional. The default parser extracts parsings of and it has three options: (i) using the default algorithms , (ii) using the userdefined algorithms , and (iii) locating in precalculated set of all possible passwords based on the input rules. The output of the default parser is either a list of parsed results of or a corresponding point of in . The complexity calculator takes the default parser’s output as an input. It may either calculate the complexity of based on the provided parsings or use precalculated and then map into the minimum search space (i.e., complexity.) The complexity calculator outputs complexity and normalized complexity of . The strength calculator output is binary. means strength calculator outputs and means strength calculator outputs .
4.2 Understanding Password Complexity
In this section, we provide details of our framework and mathematical proofs to support our approach and also show that our rulebased complexity and strength calculation yields a sense of security for a given password .
The password entropy [28] is commonly used to indicate a measure of protection provided by and increases with the number of characters. The size of the all possible passwords with alphabet , , identifies the complexity for randomly generated passwords. The larger the password search space, the more difficult password is to crack by bruteforce attack.
Let us provide a couple of numerical examples to highlight some details of , , and adversaries perspective on . Assume that we want to create an eightcharacter password and the alphabet has only lower case English letters (i.e., ). There are more than 200 billion possible ways to create a password , . If an attacker knows the allowed length of and that the password only uses lowercase letters, at a rate of thousands to trillions password attempts per second, it could take to seconds, respectively to crack the password by using brute force. Note that the rate of attempt to guess a password widely varies because of adversaries hardware capabilities. It might be primitive software on outdated hardware for an everyday attacker or a dedicated infrastructure with stateofart software algorithms for a statesponsored cyber team. If we augment the alphabet with upper case letters, there are two orders of magnitude more possible ways than lowercase only passwords. When has lowercase letters with eightcharacter passwords, there are times more possible ways than lowercase letters with ninecharacter passwords (i.e., ). These examples highlight how the size of all possible passwords, , changes with respect to the number of allowed length and alphabet. Fig. 2 shows various combinations of lower/uppercase letters (i.e., ) and password length versus the number of different ways to create a password in log scale.
4.3 More Information Leads to Higher Predictability
In this section, we establish mathematical ground to construct our rulebased complexity and strength calculation. Our goal is to prove that information accumulation about a user’s password increases the predictability of the password by an adversary as we show in the following theorem.
Theorem 1 (Information Gain).
When an adversary gains more information about the possible password space , the complexity of a password decreases.
To prove this theorem, we need to prove the following lemmas.
As given in Definitions 1 and 2, a rule set , where , is a subset of . The following lemma shows that combination of any number of rules results in a rule which is a subset of .
Lemma 1 (Union).
Any combinations of rules results in a rule whose corresponding rule set .
Proof.
We prove this lemma by induction,
(1) 
Base case: Let such that . Assume that we have two rule sets and and they operate on the alphabets and , respectively. and generate and , respectively. and the cardinality of the union rule set . Note that is less than and equal to since any given rule set is not necessarily proper subset of
is the combination of these two rule sets such that . We can claim that the union of two rule sets results in a rule set in a profinite set .
Inductive hypothesis: Let be given and suppose Eqn. 1 is true for . Suppose that rule sets operate on the alphabets and , respectively. , , , and generate , , , respectively. and the cardinality of the union rule set . Note that is less than and equal to since any given rule set is not necessarily proper subset of
Induction step: Let use the assumptions from Induction step  1 and show that the result holds for . , , , , and generate , , , , respectively. ,,, and the cardinality of the union rule set . Note that is less than and equal to since any given rule set is not necessarily proper subset of
Conclusion: By the principle of induction, Eqn. 1 is true ∎
Let us give an example for Lemma 1 to give insight about the meaning of union of two rules. Assume that we have two rules such that represents the rule of dictionary words and is the numbers from zero to nine. Passwords only composed of dictionary words are in and passwords with numbers are in . When we combine these rules, , the resulting rule represents passwords with dictionary words and numbers in .
Now, let us prove that can be partitioned.
Lemma 2 (Countably Infinite).
Profinite password search space can be partitioned into a countably infinite set such that where represents a rule or a combination of rules.
Proof.
is defined as a set of all finite strings over an alphabet ,
(see Section 2.)
Suppose that we have set of rules, we will now present a procedure to transform this set into a disjoint set of the rules.
To show , it is sufficient by producing onetoone map , where is a pairwise disjoint countable (i.e., countably infinite) set and as follows:
To see s are pairwise disjoint: let us consider and , where . If . This implies that , . Thus, .
Let us prove that . Since , we certainly have . Conversely, if , it means that is in at least one of ’s. Assume that , where is the smallest set that can be a member of. Then . We know that so it can be discarded in the definition of . , thus, .
Let us define a function such that if , then as given in Definition 5. For , there is exactly one since ’s are piecewise disjoint as shown above. This means that there is no ambiguity in ’s definition. We can claim that is onetoone. is countable since . Thus, can be partitioned into a countably infinite sets such that . ∎
Let us define the advantage of an adversary in guessing a password as the difference between probabilities of guessing with and without prior knowledge.
Lemma 3 (Prior Knowledge).
If a password has a parsing , the complexity of decreases when an attacker has prior knowledge.
Proof.
As shown in Lemma 2, can be disjoint into countably infinite subsets. Assume that prior knowledge is denoted by , where . Regardless of the alphabets and , , where and .
Let us consider a scenario in which , . In other words, an attacker knows that . Therefore, since .
The minimal knowledge an attacker can have about a password is that . Without prior knowledge, the password can be guessed with a probability of . With prior knowledge (i.e., excluding ), the password can be guessed with a probability of . Therefore, we can conclude that the complexity of decreases when there is prior knowledge. ∎
Let us recall Theorem 1 which states that the smaller the search space the smaller is the complexity of a password .
Theorem 1 (Information Gain).
The complexity of a password decreases when an adversary gains more information.
Proof.
As shown in Lemma 3, an adversary has an advantage when there is prior knowledge. Assume that and represent subsets of such that and , respectively. By definition . Regardless of the size of the subsets and and all three alphabets (), as proven in Lemma 3. Thus, we conclude that more prior information means smaller search space and the complexity of a password decreases. ∎
4.4 Rulebased Complexity Lower and Upper Bounds
Password complexity depends on the passworddesign process and maximum complexity is achieved when each password character is independently drawn from uniformly distributed alphabet analogous to the result which shows the maximum entropy is achieved under the same conditions [29]. The following lemma shows that independently drawn samples from uniformly distributed alphabet provide the maximum search space cardinality among all other distributions supporting the same alphabet.
Lemma 4 (Uniformly Distributed  1).
The maximum complexity of a password is obtained if and only if each character in is pulled from a uniformly distributed input set.
Proof.
As defined in Section 2, a password is a finite string over . Thus, let us assume that

is composed of a finite number of strings such that , where and is finite.

The probabilities of is a set of positive real numbers , such that corresponds to . Note that .
We can use the inequality of arithmetic and geometric means (AMGM inequality) [30] to prove this lemma. AMGM inequality states that the arithmetic mean of a list of nonnegative real numbers is greater than or equal to the geometric mean of the same list. . The equality holds if and only if are equal. Thus, if each password character should be pulled from a uniformly distributed input set to obtain maximum complexity. ∎
The following lemma show that the maximum complexity of a password can be obtained if and only if each password in is equally likely.
Lemma 5 (Uniformly Distributed  2).
The maximum complexity of a randomly created password in is obtained if and only if is uniformly distributed for any finite .
Proof.
We prove this lemma by using Def. 10 and Theo. 1. The complexity is defined as a size of the smallest subset of in Def. 10. Theo. 1 shows that when an adversary gains more information about , the complexity of decreases.
Let us assume that all passwords are generated by the same set of rules but (i.e., and ) and the probabilities of correctly guessing passwords by an adversary are , where a probability of guessing a password corresponds to . The relationship between the probabilities as and . The complexity of any password created by and are and , respectively (see Def. 10.) The complexity of , , is strictly greater than the complexities of ,, and ,, since there is an injective function, but no bijective function, from to and . The reason is that is not created by and the rest of the possible passwords (i.e., ) are not generated by . If a password , then the probability of is .
Thus, the maximum complexity can be obtained only all passwords in created by a rule requiring all guessing probabilities of passwords are equally likely. ∎
Let us first calculate the upper bound for a password . The passwordpolicy requirements generally provide the minimum password length, denoted by and alphabet . The maximum password length can be defined in the policy or we can get the length of the longest password, denoted by , from a password database storing the existing passwords or just length of the maximum password (see Fig. 1.) The upper bound for our rulebased complexity is calculated as:
(2) 
is (i.e., the cardinality of all possible passwords) and the upper bound is the same for any rule in the set of rules and a disjoint set (see in Lemma 3) of .
The following equation provides a lower bound () on an adversary’s effort to guess a password based on our rulebased complexity measure:
(3) 
where . When a password is not part of a given rule set , then for the lower bound calculation .
As shown in Eq. 3, may have one of two possible outcomes. If a password is not a member of , the lower and upper bounds are equal. The complexity of equals to when the password is a part of the corresponding rule set, . Note that can possibly be generated by more than one rule and then its complexity is the smallest cardinality of the all these rules.
4.5 Chain Rule Provides Complexity of Passwords Having Composite Structures
The chain rule considers the case when there is more than one pattern in a password and an adversary needs to use the combination of rules to crack the password.
Due to the improved passwordpolicies and richer alphabets, passwords generally have multipart structures such as combination of uppercase letters, lowercase letters, numbers, and characters. We define the rules as a part of (see Definition 1 and Lemmas 12) and a rule generally represents a small portion of . For example, if is lowercase english letters and numbers from zero to nine, a rule of dictionary words only represents a small portion of . Thus, it is expected that passwords are generally the combination of various rules. The calculation of the complexity should reflect an accumulation of these small search spaces defined by the combination of rules (see Def. 3) and/or a rule as shown below:
(4) 
where represents all possible combinations of rules in and . We assume that if a password is not a member of a subset , then , where . For example, might be a rule of dictionary words (dict.) or it might be a combination of two rules such as the first character of a password is a number and the rest of a password is dictionary words (e.g., )
4.6 Password Parsing Provides More Accurate Complexity Calculation
Bruteforce (or exhaustive search) attack is the last resort for cracking a password since it is the least efficient method. It requires to systematically try all the combinations. Bruteforce always cracks a password when there is no time constraint. Furthermore, if a password has a predictable structure, it makes exhaustive search feasible. As explained up to here, our main goal is to provide better sense of security for a user. Therefore, we want to be conservative with the rulebased complexity calculation. To provide better feedback to a user, our rulebased complexity engine parses the given password to extract various patterns.
A user can use default and/or a userdefined password parsing mechanism in order to extract patterns in a given password (see Fig. 1.) The parser uses an alphabet, which might be the parser specific or the common alphabet that our rulebased password complexity engine uses, to extract the patterns in a given password. Assume that a parser uses lower and upper case english letters and digits from zero to nine as an alphabet () to extract the patterns with number only and three consecutive letters in a password. The input 1LoveSoccer can be parsed as a number of different ways such as 1LoveSoccer, 1LoveSoccer, 1LoveSoccer, etc. Our rulebased complexity engine compares all these parsed results with a given rule and find the minimum search space to calculate the complexity of the password. For example, if we have two separate rules, namely digits () and dictionary words (), to check these parsed passwords, all extracted patterns are compared to these rules to calculate password complexity. Let us look at password 1LoveSoccer. provides for 1 (see Section 4.5), and for both Love and Soccer and gives for 1, and for both Love and Soccer. For the purpose of readability, we use scale for the following calculations. Thus, . Now, let us calculate the complexity for 1LoveSoccer. provides for all 1, Lov, and eSoccer and gives for 1, and for both Lov and eSoccer. Thus, (see Section 4.5.) After calculating all parsed result, the complexity of the password ILoveSoccer is .
4.7 OrderAware Chain Rule Complexity
In bruteforce cracking, an attacker tries every possible string in until it succeeds. More common methods of password cracking, such as dictionary attacks, pattern checking, word list substitution, etc. attempt to reduce the number of trials required and will usually be attempted before exhastutive search. In other words, there are probable paths that an attacker can try to recover a password before trying all combinations in . If we have an idea of these probable paths such as an attacker checks dictionary words before word list substitution, our rulebased engine can incorporate this information into the complexity calculation as shown below:
(5) 
where is generatable by and an attacker tries before , before and so on. .
Fig. 3 presents an example of a permutation of rules. In this scenario, the directed graph has three nodes and two edges, . As defined in Def. 4 and formulated in Eq. 5, the order of evaluation of a password ’s complexity is , , and then .
When there is no idea about the order of rules, the complexity calculations can use the minimum of all possible orders to provide a lower bound to a user as shown below:
(6) 
4.8 Password FATStrength
Password FATstrength is a calculation of the effectiveness of a password in resisting guessing and bruteforce attacks. To ensure an acceptable level of security, our framework provides FATstrength of a password defined in Def. 12.
Most of the password strength meters categorize a password as very weak, weak, strong, and very strong [31]. They do not use the estimated timetocrack (except passfault), an adversary’s computational power, or the user’s online presence. However, the estimation of a password strength should be a function of endurance to bruteforce attack. Our hypothesis given in Eq. 7 uses the factors that can be used by an adversary to calculate a password’s FATstrength. For example, one or more rules can be extracted from a user’s online presence (e.g., facebook account). When a user creates a password using personal information that is publicly available, our framework has the ability to incorporate this customized information into the set of rules .
If , then our FATstrength calculation will include this information in the cardinality of the complexity as given below:
(7) 
where is a function of a type of password storage (e.g., oneway hash function), is the computation power used by an adversary to crack a password , is the number of parallel processors, and shows the acceptable timetocrack that can be defined by a user or calculated from password change policy. represents strong and represents a scenario in which is not strong. Note that the computational power is a function of time since it incorporates Moore’s law (see Table I) into account while calculation the FATstrength.
Year  Relative Computing Power 

2015  1 x 
2025  32 x 
2035  1024 x 
2045  32768 x 
Fig. 4 shows a highlevel model of FATstrength calculation given in Eq. 7. Strength framework uses password protection methods, , and the expected life time of , , as auxiliary parameters. Imagine that certain rules and computational power can be modeled as an adversarial capabilities. follows the Moore’s law (i.e., an adversary’s computational capacity doubles every other year); however, it can also be fed into the strength framework as a different function. For example, if an adversary is a knownstate actor and improves its computation capacity every month, this information can be incorporated into . The complexity framework provides the cardinality of an estimated complexity of . As explained in previous sections, various number of complexities are calculated by our framework. is the minimum of all calculations if a user does not enforce a certain complexity calculation (e.g., order aware chain rule complexity.) The strength framework uses the binary test of hypothesis to decide between and which indicates whether is strong or not.
5 Conclusions
In this paper we formalize the concepts of password complexity and password strength and propose a novel approach to calculate password strength and complexity while providing a general framework for analyzing/comparing other available password strength/complexity estimators. Our framework incorporated human biases into our calculation so that lowerbound of a password strength and complexity can be provided to a user. The key insight we employ is that a bruteforce attacker does not assume all guesses are equally likely, so one should not assume all possible passwords are equally good. As a result, our framework to calculating password strength and complexity uses the idea that some guesses are far better than others since humanbased password choices are not random. Furthermore, our approach can easily be generalized to accommodate other methods for storing secret information and authenticating identities and/or accounts.
References
 [1] V. R. Team et al., “Verizon 2012 data breach investigations report,” Technical report, Tech. Rep., 2012.
 [2] R. Shay, S. Komanduri, P. G. Kelley, P. G. Leon, M. L. Mazurek, L. Bauer, N. Christin, and L. F. Cranor, “Encountering stronger password requirements: User attitudes and behaviors,” in Proceedings of the Sixth Symposium on Usable Privacy and Security, ser. SOUPS ’10. New York, NY, USA: ACM, 2010, pp. 2:1–2:20.
 [3] S. Komanduri, R. Shay, P. G. Kelley, M. L. Mazurek, L. Bauer, N. Christin, L. F. Cranor, and S. Egelman, “Of passwords and people: Measuring the effect of passwordcomposition policies,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’11. New York, NY, USA: ACM, 2011, pp. 2595–2604.
 [4] D. V. Klein, “Foiling the cracker: A survey of, and improvements to, password security,” in Proceedings of the 2nd USENIX Security Workshop, 1990, pp. 5–14.
 [5] A. Narayanan and V. Shmatikov, “Fast dictionary attacks on passwords using timespace tradeoff,” in Proceedings of the 12th ACM Conference on Computer and Communications Security, ser. CCS ’05. New York, NY, USA: ACM, 2005, pp. 364–372.
 [6] C. Castelluccia, C. Abdelberi, M. Dürmuth, and D. Perito, “When privacy meets security: Leveraging personal information for password cracking,” CoRR, vol. abs/1304.6584, 2013.
 [7] M. Weir, S. Aggarwal, B. de Medeiros, and B. Glodek, “Password cracking using probabilistic contextfree grammars,” in Proceedings of the IEEE Symposium on Security and Privacy, May 2009, pp. 391–405.
 [8] Z. Li, W. Han, and W. Xu, “A largescale empirical analysis of chinese web passwords,” in Proc. 23rd USENIX Security Symposium, USENIX Security (August 2014), 2014.
 [9] R. Veras, C. Collins, and J. Thorpe, “On the semantic patterns of passwords and their security impact,” in Proceedings of the Network and Distributed System Security Symposium (NDSS’14), 2014.
 [10] J. Ma, W. Yang, M. Luo, and N. Li, “A study of probabilistic password models,” in Proceedings of the IEEE Symposium on Security and Privacy, May 2014, pp. 689–704.
 [11] B. Ur, S. M. Segreti, L. Bauer, N. Christin, L. F. Cranor, S. Komanduri, D. Kurilova, M. L. Mazurek, W. Melicher, and R. Shay, “Measuring realworld accuracies and biases in modeling password guessability,” in 24th USENIX Security Symposium (USENIX Security 15). Washington, D.C.: USENIX Association, Aug. 2015, pp. 463–481.
 [12] M. Dell’Amico, P. Michiardi, and Y. Roudier, “Password strength: An empirical analysis,” in INFOCOM, 2010 Proceedings IEEE, March 2010, pp. 1–9.
 [13] C. Castelluccia, M. Dürmuth, and D. Perito, “Adaptive passwordstrength meters from markov models.” in Proceedings of the Network and Distributed System Security Symposium (NDSS), 2012.
 [14] J. Bonneau, “The science of guessing: Analyzing an anonymized corpus of 70 million passwords,” in Proceedings of the IEEE Symposium on Security and Privacy, May 2012, pp. 538–552.
 [15] M. L. Mazurek, S. Komanduri, T. Vidas, L. Bauer, N. Christin, L. F. Cranor, P. G. Kelley, R. Shay, and B. Ur, “Measuring password guessability for an entire university,” in Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, ser. CCS ’13. New York, NY, USA: ACM, 2013, pp. 173–186.
 [16] X. de Carné de Carnavalet and M. Mannan, “From very weak to very strong: Analyzing passwordstrength meters,” in Network and Distributed System Security (NDSS) Symposium 2014. Internet Society, February 2014.
 [17] Passfault, http://www.passfault.com/.
 [18] D. Florencio and C. Herley, “A largescale study of web password habits,” in Proceedings of the 16th International Conference on World Wide Web, ser. WWW ’07. New York, NY, USA: ACM, 2007, pp. 657–666.
 [19] B. Ur, P. G. Kelley, S. Komanduri, J. Lee, M. Maass, M. L. Mazurek, T. Passaro, R. Shay, T. Vidas, L. Bauer et al., “How does your password measure up? the effect of strength meters on password creation.” in USENIX Security Symposium, 2012, pp. 65–80.
 [20] M. Weir, S. Aggarwal, M. Collins, and H. Stern, “Testing metrics for password creation policies by attacking large sets of revealed passwords,” in Proceedings of the 17th ACM Conference on Computer and Communications Security, ser. CCS ’10. New York, NY, USA: ACM, 2010, pp. 162–175.
 [21] R. Shay, P. G. Kelley, S. Komanduri, M. L. Mazurek, B. Ur, T. Vidas, L. Bauer, N. Christin, and L. F. Cranor, “Correct horse battery staple: Exploring the usability of systemassigned passphrases,” in Proceedings of the Eighth Symposium on Usable Privacy and Security, ser. SOUPS ’12. New York, NY, USA: ACM, 2012, pp. 1–20.
 [22] P. Kelley, S. Komanduri, M. Mazurek, R. Shay, T. Vidas, L. Bauer, N. Christin, L. Cranor, and J. Lopez, “Guess again (and again and again): Measuring password strength by simulating passwordcracking algorithms,” in Security and Privacy (SP), 2012 IEEE Symposium on, May 2012, pp. 523–537.
 [23] R. Shay, S. Komanduri, A. L. Durity, P. S. Huh, M. L. Mazurek, S. M. Segreti, B. Ur, L. Bauer, N. Christin, and L. F. Cranor, “Can long passwords be secure and usable?” in Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems, ser. CHI ’14. New York, NY, USA: ACM, 2014, pp. 2927–2936.
 [24] R. Anderson, Security engineering. John Wiley & Sons, 2008.
 [25] J.P. Aumasson, W. Meier, R. C.W. Phan, and L. Henzen, The Hash Function BLAKE. Springer, 2014.
 [26] S. Yang, S. Ji, X. Hu, and R. Beyah, “Effectiveness and soundness of commercial password strength meters,” 2015.
 [27] K. Rankin, “Hack and /: Password cracking with gpus, part ii: Get cracking,” Linux J., vol. 2012, no. 214, Feb. 2012.
 [28] J. C. Principe, Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer Science & Business Media, 2010.
 [29] C. E. Shannon, “A mathematical theory of communication,” SIGMOBILE Mob. Comput. Commun. Rev., vol. 5, no. 1, pp. 3–55, Jan. 2001.
 [30] M. Hirschhorn, “The amgm inequality,” The Mathematical Intelligencer, vol. 29, no. 4, pp. 7–7, 2007.
 [31] P. G. Inglesant and M. A. Sasse, “The true cost of unusable password policies: Password use in the wild,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’10. New York, NY, USA: ACM, 2010, pp. 383–392.