Abstract
\addchaptertocentryAbstract
Password users frequently employ passwords that are too simple, or they just reuse passwords for multiple websites. A common complaint is that utilizing secure passwords is too difficult. One possible solution to this problem is to use a password schema. Password schemas are deterministic functions which map challenges (typically the website name) to responses (passwords). Previous work has been done on developing and analyzing publishable schemas, but these analyses have been informationtheoretic, not complexitytheoretic; they consider an adversary with infinite computing power.
We perform an analysis with respect to adversaries having currently achievable computing capabilities, assessing the realistic practical security of such schemas. We prove for several specific schemas that a computer is no worse off than an infinite adversary and that it can successfully extract all information from leaked challenges and their respective responses, known as challengeresponse pairs. We also show that any schema that hopes to be secure against adversaries with bounded computation should obscure information in a very specific way, by introducing many possible constraints with each challengeresponse pair. These surprising results put the analyses of password schemas on a more solid and practical footing.
HumanUsable Password Schemas:
Beyond InformationTheoretic Security \supervisorProf. Manuel Blum \cosupervisorProf. Santosh Vempala \examiner \degreeSCS Honors Undergraduate Research Thesis \addresses \subjectComputer Science Department \universityCarnegie Mellon University \departmentComputer Science Department \groupSchool of Computer Science \facultySchool of Computer Science \pdfstringdefDisableCommands\pdfstringdefDisableCommands
\univname
[1.5cm] SCS Honors Undergraduate Research Thesis
[0.5cm]
\HRule
[0.4cm] \ttitle
[0.4cm] \HRule
[1.5cm]
Author:
\authorname
Advisor:
\supname
coAdvisor:
\cosupname
[3cm]
April 29, 2016
[4cm]
Acknowledgements.
\addchaptertocentry\acknowledgementname I am thankful to Professor Santosh Vempala for his invaluable help and suggestions, including creating and helping to break several schemas.Thanks to Dr. Jeremiah Blocki for his assistance with the constraint solver, as well as all his very constructive feedback on this paper. Thank you to Samira Samadi and Lisa Masserova for helpful discussion.
Thank you to my Abba for supporting and encouraging me through this process.
Finally, my deepest gratitude is to my advisor, Professor Manuel Blum. He gave me blunt criticism when I deserved it; he nonetheless later agreed to advise me, even when I hadn’t proven myself. He encouraged me when I doubted myself and opened my eyes to the joys of research. I am deeply indebted to him for all this and much more.
Contents
Chapter \thechapter Introduction
Password users frequently employ passwords that are too simple, or they just reuse passwords for multiple websites [1, 2]. A common complaint is that utilizing secure passwords is too difficult. One possible solution to this problem is to use a password schema. Password schemas are deterministic functions which map (typically the website name) to (passwords).
Previous work has been done on developing and analyzing publishable schemas. Some provide security even against adversaries who have seen a dozen random challengeresponse pairs, but take more than an hour for a human to memorize, or more than a minute to generate responses [3]. Others meet the humanusability requirement but are limited in security, or they only meet the usability requirement with the aid of a semitrusted computer [4, 5]. Importantly, each of these schemas has been almost exclusively analyzed through a theoretical lens.
When assessing the practical security of a schema, it becomes more beneficial to consider an adversary with bounded computation. As in [4], we define for a given schema a metric known as , denoted (occasionally we use to refer to the schema quality of the schema ). We define a as a word chosen from a given dictionary and its respective response; we define a as a randomly chosen challenge and its response. A schema is said to have quality if an informationtheoretic adversary (i.e., an adversarial Turing machine with unbounded computational power) is able to correctly guess the response to the next challenge within ten attempts after seeing an average of 1 random challengeresponse pairs. Given 1 challengeresponse pairs, can a computationally bounded adversary be expected to successfully guess the next correct response?
Certainly, there exist schemas for which this is not the case. Specifically, Blocki et al. [5] showed a method of constructing a schema for which a program is believed to require far more than examples to break, relying on the intractability solving random constraint satisfiability problems; Allison et al. [6] achieved the same, relying on the intractability of learning juntas. However, these schemas do not allow the user complete selfsufficiency: in the former, a user is presumed to have access to a semitrusted computer. In the latter, the website is trusted to provide the same challenge on each visit (or else it would have to know the user’s secret). Should an attacker compromise the website and become able to adaptively provide new challenges to the user, this schema is likely no longer secure.
Our goal then is to develop a schema which requires no involvement from any party except the user yet is powerful enough that a computer requires more than examples on average to break it. These schemas are functions applied to an input that remains constant for each website; typically, we apply them to the domain name, though other possibilities exist—the challenge itself need not remain secure. Chapter 2 introduces the desirable traits of a password schema that meets our requirements.
With these desiderata established, Chapter 3 presents in detail a particular schema with a previously unknown upper bound for . We also introduce techniques for breaking these types of schemas and address the feasibility of constructing one that is secure against computationally bounded adversaries. Chapter 4 considers schema implementation from the user’s perspective, introducing fundamental operations that a human can perform to transform a challenge into a response. Based on these operations, we introduce certain limitations on a schema which moderately constrain the time that a user has to respond to a challenge.
Using the same axiomatic operations, Chapter 5 presents an argument for why any schema that hopes to be secure must hide information in a very specific way. We then present and analyze one such schema, including the successes and failures of current attempts to break it.
Chapter \thechapter What Makes a Good Password Schema
A good password schema is one that meets a multitude of requirements for security and usability. The strength of the schema can be assessed by analyzing to what extent it meets each of them. We use the same criteria as [4], restated briefly here for convenience.
1 Desiderata
A good password schema should be publishable, humanusable, secure, selfrehearsing, and analyzable.

Publishable
A schema is publishable if a detailed description of its implementation is publicly available; the security of the schema cannot rely on obscurity, except for the user’s individual, secret key(s).

HumanUsable
The schema must be implementable in the user’s head, without the use of additional instruments such as a calculator or pen and paper. We consider schemas with varying limits on the time a user should take to generate a response from any given challenge. We additionally require the bound of one hour on the total time over a user’s life required to memorize and maintain the schema.

Secure
An adversary who knows the schema should have no better than random chance of being able to correctly guess responses to new challenges among those consistent with challengeresponse pairs observed so far. Note that this definition does not limit the adversary to polynomialtime bounded computation—as a result, our definition of security is informationtheoretic.
This definition comes from a game played between the user and an adversary: an impartial judge provides an adversary with a randomly chosen challenge from the dictionary, and the adversary gets 10 attempts to guess the correct response. If none of the guesses are correct, the judge provides the adversary with the response and repeats with a new challenge. This is repeated until the adversary correctly states the response to a new challenge. is then the average number of challenges that the adversary needed to see in order to be able to guess the correct response, including the final one. Note that is dependent on the length of the challenge.

SelfRehearsing
Use of the schema should result in frequent practice of every part of its implementation, such as applying functions over the whole domain. For example, if the schema involves a map on the domain of letters, then J, Q, and other uncommon letters might be mapped very infrequently. Steps must be taken to ensure that they are still practiced normally, so the user doesn’t forget them.

Analyzable
The schema should be stated so explicitly that the instructions are able to be followed by a Turing machine. Note that we do allow for randomization over the choice of secret keys, but the domain over which the random choice is made and the method of randomization must be clearly defined and published.
2 Specific Definitions
A few of our definitions or expectations are more specific than these general desiderata. Particularly, for our analysis of we define the dictionary from which challenges are selected as the uniform distribution over all strings of a particular length using the twentysix letters of the alphabet. Blum and Vempala [4] also consider a sample space of the most popular website names, but for a simpler analysis we stick with the uniform distribution over random strings.
In addition, we occasionally impose somewhat stricter limitations on responsetime for the user. Blum and Vempala [4] originally imposed a maximum responsetime of thirty seconds; we consider a schema which limits the user to one second per letter of the challenge. Since we primarily consider challenges of lengths between six and twelve characters, for the most part this limit more than halves the time available to the user.
Chapter \thechapter Breaking Schemas with Bounded Computation
3 Example Schema: Digit Schema 3 (DS3)
We present one of the schemas analyzed in [4]: DS3, so titled because it utilizes a map from letters to digits. This schema is one of the simpler ones, with a high ratio of security to usability.
3.1 DS3 Implementation
A challenge consists of letters and the response consists of digits . All addition is modulo 10.
The secret key consists of:
, a random map from the alphabet to digits
, a random permutation on digits
Let denote the response to under DS3 using secret maps and .
To determine :
Output
For to :
Output
3.2 Analysis
In the guessing game that defines , each challengeresponse pair that an adversary observes can provide some new information about the user’s secret mapping. For an informationtheoretic adversary, the simplest technique for breaking the schema would be to maintain all possible pairs in a set and eliminate them as inconsistencies arise. Since all mappings are equally likely, the adversary is unable to distinguish between any pairs that are so far consistent with the seen challengeresponse pairs. Thus, the adversary is only guaranteed to correctly guess the response to the next challenge when
(though he could get lucky and guess it sooner).
We can see that the size of is reduced in expectation by at least a factor of 10 with each seen challengeresponse pair; if there are fewer than 10 unique responses by the remaining possible secret mappings, the adversary will guess the correct response and the game will end. Since all mappings are initially possible and the challenges are chosen at random, we expect the remaining possible mappings to distribute approximately evenly among the possible responses; this means that seeing the correct response should eliminate at least of the sofar consistent mappings, which implies that for any schema, is roughly upperbounded by the logarithm base 10 of the size of the secret key space.
For DS3, the key space for is and for is . Thus, the upper bound for is roughly . Simulating the above informationtheoretic technique on random challenges of length 10, [4] gave an estimate for of 6.91. However, this is still with the assumption of an infinite adversary—an actual computer obviously does not have the capacity to maintain all possible mappings in memory. This means that this value serves as only a lower bound for the security of the schema for a computationally bounded adversary. The question remains: can a computer break any schema with pairs in expectation, or is the infinite computational power actually necessary to extract and utilize all the information contained in the challengeresponse pairs? More generally, can a human being possibly do enough computation and obfuscation in a limited amount of time (thirty seconds, by the limit set in [4]) that a computer cannot extract all the available information?
As an alternative to holding all possible mappings in memory, suppose instead the adversary simply assigns an ordering to the secret keys—for the case of DS3, there is an obvious, easy ordering—and then iterates through them one at a time. At each possible key, the adversary tests if it is consistent with the observed challengeresponse pairs. If so, he guesses the next response using that secret; if not, he eliminates it and moves on to the next one. This bypasses the memory issues imposed by the previous technique, but introduces a new problem: considering every possible key is intractable. So yes, a computationally bounded adversary can extract all the information with limited memory, but whether or not it can be done in a reasonable amount of time remains to be seen. In order to build a solver that can hope to break a schema with challengeresponse pairs in a reasonable time frame, the problem must be approached more cleverly.
Note that in this context we are worried about actual running time, rather than asymptotic complexity. Even if it requires a computer program exponential time to break a schema, it could still potentially succeed extremely quickly (i.e., a few seconds to a few minutes). This is because the the schema must be implemented by a human, which severely limits the size of the problem. So, instead of looking for a humanusable schema which requires any program exponential time to break, we are looking for one that takes a computer an extremely long time to solve, at which point a user can simply change his secret key. We do not define an exact duration which we consider to be long enough; it should be considered acceptable if the user is willing and able to memorize a new secret key with the same frequency, to ensure security.
4 Constraint Solving
Because the schema is published, the adversary knows exactly how the secret key(s) were used to transform the challenge into the response. This means that each challengeresponse pair shared with the adversary provides a set of constraints on the correct secret mapping, which can be used to eliminate inconsistent mappings without explicitly trying them. For example, a single challengeresponse pair from DS3 results in constraints involving and .
There is a wide range of methods for solving this problem, each one differently balancing the tradeoff between flexibility and speed. Simple Gaussian elimination is incredibly fast; with this strength comes the fact that it only works for equalities. If the constraints on the secret includes inequalities, Gaussian elimination will not work. Note that we could choose to ignore those constraints in our solver. While this would certainly work, the fact that our solver is explicitly ignoring information contained in each challengeresponse pair means that we will almost certainly require more than pairs on average to break the schema.
If the constraints include inequalities, the next solution to consider is linear programming, where the objective to minimize is just a penalty function and incorrectly guessing the response is associated with an infinitely positive value. This is also quite fast, but has its own weakness: for linear constraints with a modulus, the objective function is nonconvex. Since addition or multiplication with two or more digits takes humans quite a while without pen and paper, most humanusable schemas will use modular arithmetic. This means that for most humanusable schemas, linear programming is out.
This leaves two clear remaining possibilities: mixed integer linear programming (MILP) and constraint solving. MILP is linear programming with the ability to constrain certain variables to integers. With this method we can solve systems of linear inequalities with a modulus. MILP is quite slow—indeed, it is NPHard, as can be shown with a simple reduction from Vertex Cover^{1}^{1}1https://en.wikipedia.org/wiki/Integer_programming#Proof_of_NPhardness —but it is still feasible since the number of constrained variables is limited by the amount of computation a human can do. Because the schema is implemented by a human, asymptotic complexity is not a concern.
Constraint solving is more general than MILP. Constraint solvers actually utilize MILP solvers, but with some additional specific constraint types, such as specifying that certain sets of variables are all the same or all different. Constraint solvers can also have symbolic variables representing entire functions rather than single applications of a function; this unique feature means that as function applications are chained together, the number of variables in a constraint solver increases additively, as opposed to multiplicatively as in linear programming. Additionally, unlike MILP, constraint solving is focused on feasibility rather than optimization. This is actually more appropriate, because the only thing our schema solver needs to find is a secret key that is consistent with all previously seen challengeresponse pairs. However, since finding a solution is equivalent to solving a MILP problem, it’s more beneficial to analyze modern MILP limitations when considering solver capabilities.
Both of these methods are traditionally solved with the "Branch and Bound" method described in [7]—or a refined version of this technique known as "Branch and Cut"—and are therefore similar in complexity and runtime. As noted above, constraint solving allows for symbolic representation of function application constraints and the use of "higherlevel" constraints which increases complexity but decreases the total number of constraints needed.
While [4] provided an estimate of 6.91 for (for ), to the best of our knowledge, no upper bound for a computationally bounded adversary has been previously defined. Using Microsoft Solver Foundation’s constraint solver, we wrote a program to break DS3 according to the responseguessing game. How our solver cracks this schema is quite simple, because each new challengeresponse pair reliably produces a new set of linear constraints. The constraint solver searches for possible values for the user’s secret keys and that are consistent with previously seen examples and uses them to guess the next response. Following the rules of the game, if the program guesses correctly, the round is over. Otherwise, it is given the correct response—which is utilized by translating the challengeresponse pair into new constraints and applying them to the system—and allowed to guess the response to the next one.
As the system has more and more constraints applied to it, the space it searches for a potential solution rapidly shrinks until it eventually correctly guesses the response. At this point we note the number of challengeresponse pairs it saw and move on to the next round. This method of cracking this schema runs extremely quickly; a single round takes no more than a second, often much less. DS3 is an excellent example of how schemas can be moderately difficult for a human to implement and yet very simple for a computer to break quickly. Over several hundredthousand experiments, we estimated an empirical upper bound for of 6.89. Exactly calculating is infeasible.
This result, that a computer can successfully extract all information included in a set of leaked challengeresponse pairs, is the first instance of a true upper bound on the security of a schema for a computationally bounded adversary. While previous work was purely theoretical, for the first time we have concrete evidence that constraint solving can be used to break a schema in the smallest number of challengeresponse pairs possible. This work further indicates that for any humanusable schema—until there is evidence otherwise—a pragmatic analysis should consider as the actual number of challengeresponse pairs that a real life adversary would require to break a user’s secret keys. However, the amount of time that an adversary would take to break said schema is still in question; this problem is addressed in Chapter 5.
Chapter \thechapter TimeBounded Schemas
To consider how a human can hide information from a computationally bounded adversary, we must first explicitly define the capabilities of a human when implementing a password schema. Blum and Vempala [4] gave a bound on time needed to apply the schema to a single challenge, but this bound does not consider specific actions to be carried out by a user. In order to get an idea of what kind of obfuscation a humanusable schema can achieve, we have to be able to break a schema into its individual operations and determine how they work together to hide information from an adversary.
5 Operations
We define a set of axiomatic operations, OPS, that we believe enumerate (!) all possible operations a human can perform with the letters of a challenge and any resulting values. To each of these operations we assign a time cost. These time costs were determined empirically and represent what we consider to be a reasonable lower bound on the amount of time the average user would require to complete an operation. More specifically, we derived these time costs by implementing a wide array of schemas with varying operations and then solving for the minimum individual operation times. This metric therefore relies on the assumption that the authors of this paper are not significantly slower than the average user.
Using the estimated times for each operation, any analyzable schema can be broken down into its constituent operations and the amount of time to apply the schema to a particular challenge can be estimated. The individual elements of OPS are listed below, along with their respective time costs. Keep in mind that the time cost is for the average user, and in cases where an exception might be made, it is listed.

Perform an arithmetic operation [ seconds]
– This is for adding or multiplying two 1digit numbers. It becomes more complex for larger numbers, but these can be broken down into combinations of 1digit operations. This does not include incrementing or decrementing a value that is being held in memory. If we are holding a single value in memory, incrementing or decrementing it costs seconds.
– This operation relies on the user having memorized their times tables and being comfortable with addition. If the user has memorized larger addition or times tables then those are included.
– If the operation is performed with a simple, intuitive modulus (i.e. 2, 5, or 10), the time cost does not change. If the modulus is more complicated, it counts as additional operations (see search, below). If the user has memorized other moduli or has an extremely fast method of calculating it (comparable to "keep the last digit" for mod 10), then these are included.

Apply a function (map) or permutation [ seconds]
– This operation encompasses the application of any memorized function that accepts a single argument (e.g., and in DS3). If a function accepts multiple arguments, a human will have to perform something akin to currying—considering one argument at a time—which will take longer.

Iterate to a value (context switch) [ seconds]
– This operation is for any context switch. This includes considering a different value in memory, switching to a different function, and considering the next letter of the challenge.

Output a character [ seconds / seconds]
– This operation is for outputting any character, whether it was generated via a schema or is just meant to artificially lengthen the response (e.g., prepending the response with "aA@1" to meet a password requirement). Obviously a generated character takes longer to output; if the user knows the entire set of characters to be typed, it is much faster to consider the set of characters as a whole and type them together quickly. We found that outputting a character that was individually generated accrues a greater time cost , while is the lower bound for the cost per letter of any other type of output.

Classify/Compare a character [ seconds]
– This operation is for grouping letters or digits into simple, wellknown categories, such as vowels/consonants or evens/odds. The categories must be known before learning the schema, commonly rehearsed, and limited to a small number of groups—this varies per user, but is probably at most three to four. For example, most users would not be able to classify a twodigit number as prime/composite within this time cost. If a schema required such a classification, this would fall under an apply operation^{1}^{1}1classify is essentially the same as apply, but it takes less time because it’s for alreadyknown, oftenpracticed classifications (and it’s limited to a small number of categories). This has the upside of costing less time, but the downside of being enumerable by a computer, so it can’t be relied upon as the only method of obfuscation (see Lemma 1)..
– Because our schema must be publishable and analyzable, any classification used in the schema must be explicitly stated (though the schema can list several options from which a user can choose at random). A schema that expects the user to choose their own method of classification is not analyzable and therefore does not fit our requirements. This requirement implies that the classifications that the user can choose from will be easily enumerable by a computer; it is unreasonable that the average user could be expected to already know and frequently practice so many classifications that this is not true. We limit the number of possibilities to something on the order of , which we consider to be a very generous limitation (our own brainstorming sessions provided closer to unique classifications).
– This operation also includes simple comparisons, such as determining if a given number is larger than another number.

Search for a value [ seconds]
– This operation, which is quite variable in terms of time cost, is for any time a user wishes to "fill in the blank" with a familiar value. For example, if the user is calculating 53 mod 7, he will likely search for a known multiple of 7 that is as large as possible without being greater than 53 (if the user must iterate through multiples of 7, this will obviously take longer than a single search operation). Having found 49, he will then subtract 49 from 53 to find the answer. In our tests, we found it common for users to also evaluate subtraction and division by performing a search operation (i.e., searching for the value such that ).
– We limit this operation to two inputs. If the calculation requires more than two inputs, it must be broken down into several chained applications.
– Like classification, an analyzable schema would have to explicitly state how to use this operation, meaning if the user must choose at random from a set of possible ways to use the operation, all the choices will be enumerable by a computer—we use the same limitation of .
– We can further subdivide this operation into three categories:

If none of the inputs are unknown to the adversary, then it doesn’t hide any information; instead it is just a tool to help the user complete some other calculation such as division or subtraction.

If one of the inputs is unknown to the adversary, this operation is akin to an apply operation, with the distinction being that it requires more work to calculate, but is naturally selfrehearsing.

If both of the inputs are unknown to the adversary, this operation is similar to a perform operation, where the combination of the values is public but the values themselves are not.

5.1 Lemma 1
Suppose a user must choose a combination of classify operations (chosen with replacement out of options) and search operations (chosen with replacement out of options) to be used to map a public input, with the restriction that . Then all such possible combinations are enumerable by a modern computer.
According to our estimated time costs for the elements of OPS, this inequality limits us to or . We specify this limitation because chaining together multiple applications of classify or search could theoretically be condensed into a single apply operation. So, if the time cost of this combination of operations is greater than the time cost of an apply, the user might as well just use an apply instead.
Since one of the inputs is public, this requires that any application of the search operation is of either the first or second subtype. We will exclusively consider those for which one of the inputs is private, as otherwise it is clear that the search operation will not increase the total number of possible combinations.
By definition, both and are easily enumerable by a computer—we provided an upper bound of . Given the limitations above, the total number of possible choices is equal to . With such a relatively small number of choices, a modern computer can easily simulate each combination on the public inputs. With this method, a program could maintain all consistent combinations, eliminating them as they are revealed to be impossible. ∎
Note that such an argument does not hold for the apply operation because the possible space of secret functions is larger than can reasonably be attacked with brute force by a bounded adversary; if this were not the case, then a computationally bounded adversary could break any schema with Q examples by enumerating all possibilities, as described above.
6 Definitions
We define a as any schema that consists of a single pass through the challenge, during which the user must use all the letters to produce the response.
By contrast, a is defined as any schema that can be divided into the following two phases: the first phase involves a single pass through the challenge, using as many or as few of the letters as desired. During this phase no elements of the response are outputted; instead all computations are done as a method of "seeding" the second phase. At the end of the first phase, the user will have some value to be used in the second phase—this could be something like a starting position, or something more complex such as a designation of which of several functions he should use. We refer to this value as the seed. The second phase proceeds as a onepass schema, but the user can also use whatever seed he got from the first phase.
To consider the usefulness of the preliminary iteration through the challenge, we present the following way of thinking: consider the first pass as a process of traversing a directed acyclic graph (DAG), in which each node represents a set of intermediary values and each leaf represents a possible seed. Each time a user uses a letter of the challenge in this pass, he follows the appropriate edge to the next node, until he eventually arrives at the leaf which tells him the seed for the second pass through the challenge. Since no elements are outputted during the first phase, an adversary has no information about which leaf the user ended on; the idea being that if the user hides which seed he used for the second pass, much of the information leaked by challengeresponse pairs will be obscured and difficult for the adversary to extract.
7 Assumptions
In addition to these definitions we’ll assume that no elements in the response can be a direct mapping from an element of the challenge (i.e., the result of applying a map to a single element and then outputting it). This assumption makes it more likely that information in the secret mapping is properly obscured; otherwise a single challengeresponse pair would leak too much information about the user’s secret.
Here we point out that when considering the challenge, there is no way to use a particular letter to influence the result of either pass except by mapping it to some other value set—there are no alreadydefined operations on letters, and to create one would count as memorizing and applying a function. With this in mind, we’ll add one final assumption: our schema cannot use a small number of classify and/or search operations as the sole method for converting any particular letter of the challenge into a value that can be used; by Lemma 1 this would not have sufficient security. If the schema instead uses so many classify operations that the number of possibilities is too large to be "bruteforceable" (vulnerable to a bruteforce attack), the amount of time such a combination would take for a user to evaluate would exceed that of just applying a map, so the time cost will serve as a lowerbound anyway.
These assumptions imply that in the second pass a map must be applied to all of the letters; we have to use all the letters, and applying a classification to any of the letters won’t provide enough security, as explained above. Additionally, the map must be applied to each letter individually: given that the function must have only one input, the only alternative is to apply the function to multiple letters at a time. Memorizing a map on even just pairs of letters to anything else is infeasible; this function has a domain of size 676 (whose memorization would certainly exceed our stated bound of one hour over one’s lifetime) and is also very clearly not selfrehearsing, as most letter pairs occur rarely, if at all.
8 One and TwoPass Schema Analysis
With these assumptions and time costs in hand, we can now consider the feasibility of creating a secure twopass schema with a time bound of one second per letter for the user’s response.
8.1 Theorem 1
Let be a twopass schema that can be precisely defined by some combination of the operations in OPS, where the time cost of those operations is limited to (on average) one second per letter. Define as the onepass schema which consists of just the second pass of , where the seed that would have been determined in the first pass is unchanged but public. Let represent the number of possible seeds that could result from the first pass, and let be any computationally bounded adversary that can enumerate a set of size . If can solve in , then can solve in .
This statement has one major assumption about the capabilities of the adversary, but it turns out that this assumption is actually not as hard to satisfy as it may appear. If we consider possible values for , we can see that there are severe limitations on its maximum size. If designates a choice of function, it is limited to a very small number—probably no more than 3. Expecting a user to memorize more than that is unreasonable in the hour total memorization time limitation. Another possibility is that represents a starting position. As we’ll show below, the limitation of one second per letter is so tight that performing just an additional 6 iterate operations is not possible, meaning . Finally, could just be a value to be used in a perform operation; this would require it to be a single digit. So, generally appears to be restricted to around ; if it were much larger than this, using it would need several operations, which the time restriction really does not allow for.
Next we consider . As we will show, the time constraint ensures that is extremely limited in how it can transform a challenge into a response—so limited, in fact, that it behaves almost exactly like . The fact that it can’t really do anything more than map each letter and combine the results before outputting strongly implies that . Considering the schemas defined in [4], even those significantly more complicated than had schema qualities of no more than 12.
It is our belief that these two limitations are therefore realistic, which would imply that is no greater than . Given modern computational capabilities, this even allows for quite a bit of error in our estimates. We therefore claim that this requirement is not very difficult for an adversary to meet.
Finally, note that gives strictly more information to the adversary than , so is trivially less than or equal to .
Proof of Theorem 1:
Define as the length of the challenge and as the number of generated characters in the response. This means a user has seconds to respond with characters, plus whatever additional characters he adds on (e.g. for meeting website password requirements). We begin by considering the time costs of the second pass. In the second pass the user must iterate to each letter [ seconds], apply a map to each letter [ seconds], and ultimately he’ll have to output generated characters for the response [ seconds]. He still has to somehow combine the results of the mappings, otherwise a direct 1to1 mapping will appear in the response which violates our assumption. Since there are letters, there are two possibilities to combine these values. The user can perform an operation pairwise—this would result in a minimum of operations—but this will require him to iterate to each pair of values after he has mapped them [ seconds]. Alternatively, he can keep the previous value in memory and update it with each new value as he comes to it; this is only a perform operation, but it must be done times [ seconds]. This means the second pass, at the absolute minimum, will cost seconds.
Clearly, the first pass can’t use all the letters—to do so would require, at a minimum, iterating to each letter [ seconds]. At this point we have already surpassed our limit of seconds for both passes, so using all the letters in the first pass is not a possibility.
Now we consider using some subset of the letters in the first pass. Because we’re not using all of the letters, we also have to check at each letter if it should be included in the first pass. This is a classify operation and therefore costs the user an additional seconds per letter considered. So, for each of the included letters, the user must iterate to the letter, classify whether or not the letter is included, map the letter, and then perform an operation to follow an edge of the graph. This will cost the user seconds to travel to a leaf of the DAG, with the generous assumption that he doesn’t consider any letters that aren’t included.
Given that the total time cost of the schema applied to a challenge of length is limited to seconds, this gives us . Consider some possible values of and . Suppose we only want to output one letter in the response for every two letters in the challenge—that is, . Even with this reduction, this gives us . This implies that the first pass can use only of the letters of the challenge to traverse the graph, but this is too small of a fraction! Considering a limitation of 30 seconds to produce a response, the first pass couldn’t even use one letter.
Reducing the fraction of letters that are output raises an additional complication: by doing the minimum of combination operations, the user generates values in the second pass. If he wants to output fewer characters than that, he will have to perform an additional classify operation to check whether a given value should be output. This gives the new inequality , further limiting the use of letters in the first pass. Clearly must be greater than 0; we consider a minimum of . In this case, must be no more than , meaning that less than one in thirteen of the challenge letters can be used for the first pass.^{1}^{1}1It is here that we reference our previous claim regarding possible values for : even with a maximum of , our user has at most seconds for the first pass. If we want to use even one letter for the first pass, , which implies we would have an additional 1.25 seconds. This is enough time for at most 5 iterate operations.
So, the first pass is limited to at most one or two letters of the challenge—even if is quite large and is small—meaning the number of possible paths through the DAG describing the first pass would be very small and easily bruteforceable by a computer. Since it’s bruteforceable, the adversary could behave as if it were unbounded, simulating all possible paths.
Since each seed can take on one of values, after challengeresponse pairs there are possible combinations of seeds. As this is enumerable, can simply solve with each possible combination and eliminate inconsistencies—this is equivalent to making the seeds publicly known. This means that in expectation, once sees pairs, it will be able to determine the correct seed (or, at the very least, a seed that will allow it to correctly guess the next response) for each seen challenge. Using those seeds, can work backwards, bruteforcing possible paths that were taken in the first pass, which will uncover enough information about the user’s secret to determine the seed for the next challenge. Since has already seen at least challengeresponse pairs, it will be able to break and give the correct response to the challenge, which means has also been broken in . ∎
So, a twopass schema (limited to one second per letter) is computationally no harder to break than a onepass schema. What if we drop the first pass and spend the entire time allotment on just a onepass schema? The same assumptions and limitations would apply to the single pass of this schema as do to the second pass of a twopass schema. Certainly, we could just output letters in the response and be done, though this schema functions almost exactly the same as DS3, which we know can be broken in . What other options remain? If we decide not to output all generated values, we have to classify each letter as explained above, which gives us . This means that ; with a challenge length of 15, the user would only produce a response of length 6.
In spite of this apparently negative result, this technique is actually a step in the right direction for possibly creating a password schema that a computer cannot break in a reasonable amount of time with challengeresponse pairs. As we discuss in the next chapter, a schema that hopes to achieve this level of practical security must obfuscate information in a specific way, by hiding which set of constraints correctly encodes the information of each challengeresponse pair.
Chapter \thechapter Feasibility of Future Schemas
9 What Can Computers Break?
With the establishment of modern constraint solving techniques and definitions of and limitations on a human’s computational capability, we are ready to more generally consider the possibility of schemas that—in expectation—a computationally bounded adversary requires more than challengeresponse pairs to break. We begin by considering how each of the operations a human can perform could hide information from an adversary and what distinguishes the capabilities of an infinite adversary from that of a computer.
Of the six operations that make up OPS, it is immediately apparent that iterate and output don’t actually hide any information. These two operations are exclusively to allow the user to carry out the rest of the schema and are irrelevant to security. The remaining four operations each hide information in a specific way such that an adversary knows exactly how to define constraints given a challengeresponse pair and knowledge of the operation’s use.
For any publishable password schema, a challengeresponse pair provides the adversary with one or more sets of possible constraints on the user’s secret key. For a given challenge length , we define the of a schema as the expected number of possible sets of constraints encoded by a single challengeresponse pair, denoted . This value represents how quickly we expect the number of possible systems of constraints to grow as a function of the number of seen challengeresponse pairs. Specifically, after seeing challengeresponse pairs, an adversary would be expected to have to consider on the order of different combinations of sets of constraints.
We now define two exhaustive and disjoint categories of password schemas: and . For a given challenge length , a password schema is said to be if each challengeresponse pair corresponds to exactly one set of constraints with no ambiguity (i.e., for all direct password schemas, ). DS3 is one such schema; every element of the response is equal to (except for ), which means that each challengeresponse pair gives the adversary a single set of exactly constraints. These constraints convey the information contained in the challengeresponse pair, and they can be applied to a solver to help it guess the correct mappings. In general, the constraints that are derived from a challengeresponse pair of a direct schema include all of the information included in that challengeresponse pair. Since the schema is both direct and analyzable, the resulting constraints will give the constraint solver exactly as much information as it gives to an informationtheoretic adversary.
password schemas are those for which . In other words, for an indirect schema, each challengeresponse pair could encode one of several sets of constraints—they hide from the adversary which set of constraints is correct. When an adversary sees a challengeresponse pair from an indirect password schema, each of the possible sets of constraints potentially encodes all information in that pair, but only one is correct.
It’s important to note that indirect schemas have wildly varying expansion factors. If is reasonably small (less than 10), a constraint solver can handle the incomplete knowledge with an "OR" constraint, which requires that at least one of several sets of possible constraints is true. If it’s much larger than that, solving the schema becomes a lot more complicated; given that the schemas we’ve considered tend to have a around 7, a constraint solver using "OR" constraints would have to store and work with approximately constraint sets, with each set containing several constraints. We found this to be well beyond the bounds of computation with today’s available machines: in our experiments, when , the solver couldn’t find a solution even with several days of runtime.
9.1 Conjecture 1
Let be any direct, humanusable schema, limited to 30 seconds for generating any single response. Then a modern desktop computer, using a stateoftheart constraint solver and/or MILP solver, can break with challengeresponse pairs in expectation in no more than 24 hours.
Constraint solvers from as early as 1970 could solve problems with thousands of integer variables and tens of thousands of unconstrained linear variables. Practically, these algorithms were bounded by running time, which increased as a factor of several inputs—primarily the number of integer variables, since the tree that must be searched grows exponentially with this factor [8]. Modern constraint solvers tend to only limit the number of variables by the amount of memory available to the system and can handle tens of millions of constraints. Complex MILP problems with thousands of integer variables and constraints are solvable by today’s constraint solvers in less than 20 hours; with closer to 500 integer variables, these problems can even be solved in as little as a few minutes [9, 10, 11].
A constraint solver—at the absolute maximum—has to define one variable for each element of the domain of each function that the user memorizes, as well as for any other hidden variables the user defines. Given the limit on memorization time, these are clearly less than a hundred. Additionally, new variables are created for modular arithmetic—for every modular equality the system must create an additional integer variable which represents the multiple of the modulus that is the difference between the value before and after the modulus operation. Even if every single perform operation carried out by the user creates a new variable, this would mean the limit of thirty seconds per response implies at most ~175 integer variables, and that’s an extremely conservative estimate.
These calculations imply that any direct schema can be solved extremely quickly and with expected challengeresponse pairs by a standard modern desktop. This would mean that any schema hoping to be unbreakable by a computer in must be indirect.
Unfortunately, we are not yet able to prove this. There do exist certain especially hard MILP problems with a few hundred integer variables that have yet to be solved by stateoftheart MILP and constraint solvers such as CPLEX and GLPK [12]. While it seems quite likely that the humanusability and time limitations on these schemas would ensure that they are not quite so difficult, further analysis is needed to conclusively show this.
9.2 Methods to Break an Indirect Schema
So how hard is it to break an indirect schema? To consider this, we visualize a set of challengeresponse pairs as a tree, where each challengeresponse pair is a node, and from each node extends one edge for each possible set of constraints that it could imply. All nodes in a particular level of the tree are the same challengeresponse pair, but each one represents a different combination of sets of constraints from the previous challengeresponse pairs in the tree. In order to solve the schema, an adversary must traverse the tree, pruning branches as they reveal themselves to be inconsistent. Each time he reaches a new depth, he is given ten more guesses, which he makes based on any sofar consistent paths that he has found.
When an adversary is playing the guessing game, any secret that is not contradictory to seen challengeresponse pairs is possibly the correct secret. On any particular branch of the tree, as long as there remains at least one secret key that is consistent, the adversary has no way of knowing whether or not that combination of sets of constraints is correct. Thus, the only way the adversary can definitively eliminate a branch of the tree is by showing that there is no secret mapping that is consistent with the assumed constraint sets on that branch.
The best method we’ve found to break a schema with is to perform a simple depthfirst search, assuming a particular set of constraints for each challengeresponse pair and proceeding as if it is correct. When our algorithm finds that there is no solution that abides by the assumed constraints, it backtracks to the last decision point and tries a different assumption. This is the technique we employed to break our most promising schema, SkipToMyLou^{1}^{1}1Thanks to Santosh Vempala for suggesting this schema, described below. Note that it may be possible to eliminate a branch from consideration without actually solving it, but rather by using some other heuristic that shows its inconsistency. Such an algorithm would be significantly faster than our approach.
10 Example Indirect Schema: SkipToMyLou (STML)
10.1 STML Implementation
A challenge consists of letters and the response consists of digits , .
The secret key consists of:
, a random map from the alphabet to digits
Let denote the response to under STML using secret map .
To determine :
Initialize ,
For to :
mod 10
if :
Output
This schema is called "SkiptomyLou" because its implementation consists of outputting the running total (mod 10) of the map applied to the challenge but skipping over values that are less than 5.
10.2 Analysis
STML is an excellent example of a simple, humanusable, indirect schema. For a challenge of length the response will be of length in expectation. For an adversary seeing a response of length , he will have to guess which of the elements of the challenge resulted in an outputted digit and which did not (i.e., at which indices was the running total less than 5). This means there are possible constraints that an adversary could apply after seeing one challengeresponse pair. It follows that for a given , STML has an expansion factor of . , which means that a solver can handle STML with with an "OR" constraint. The beauty of this schema is that as grows, the amount of work that the user does increases linearly, while the number of constraint sets the computer must consider grows exponentially. This exciting result implies that STML has the potential to be unbreakable by a computationally bounded adversary in . Figure 5.1 (below) displays the astonishing growth rate of STML’s expansion factor as a function of .
With a very feasible challenge length of 10,
which well exceeds a constraint solver’s capabilities using an "OR" constraint. Even better, a randomly chosen set of constraints will likely not result in an unsolvable constraint system until the adversary has seen quite a few challengeresponse pairs. Adding a new set of constraints, even an incorrect one, will often still allow the solver to find a secret that is consistent with all seen challengeresponse pairs, but most of the time it will incorrectly guess the next response. This means that our solver has to travel several levels deep into the tree before it is able to label any constraints "impossible" and prune that branch (in our experiments for , the vast majority of eliminations—over 95%—were made at a depth of at least 4). The result is a schema that forces an adversary to attempt to solve the system of constraints tens of millions of times. Where before, with DS3, our program could solve the system and break the schema in a fraction of a second, now it must do so for every possible combination of constraints until it is lucky enough to correctly guess the next response.
The process of solving a system of constraints using a constraint solver can be very effective, but it also can be much slower than just coding the logic directly. Applying and removing constraints from a bulky constraint solver millions of times is a massive slowdown, so instead we constructed a new solver, specific to the constraints of STML. This new solver uses much more memory, but eliminates possible combinations of constraints at a significantly faster rate. Our results from attempting to break STML for varying values of are shown in Table 5.1 below. While exact values for cannot feasibly be calculated, we can achieve a rough estimate as follows:
Define as the size of the alphabet of a schema’s challenge; the size of the key space is therefore . For each challengeresponse pair that the adversary sees, in expectation half of the constraints—that is, of them—will be equality constraints, and the other will be inequalities. An equality constraint is expected to reduce the consistent key space by a factor of 10, while an inequality only eliminates half of the possible values, so it reduces the space by a factor of 2. However, there are roughly possible sets of constraints. This means that each challengeresponse pair can be expected to reduce the size of the key space by a factor of no less than . So, after seeing challengeresponse pairs, the key space should be reduced by about a factor of . Setting this equal to the size of the key space and solving gives us .
This roughly approximates the maximum number of challenges that an adversary would need to solve STML; after seeing challengeresponse pairs, the adversary should be able to eliminate all but one of the possible secret keys. This means that is an approximate upperbound on ; an adversary is likely to be able to guess the correct response without knowing the exact key because, for any given challenge, a large portion of the key is expected to be irrelevant when calculating the response. Specifically, as increases, the portion of the secret key that is relevant to any given challenge grows, meaning will be better at estimating . Table 5.1 includes values for where .
Most challengeresponse pairs  
(challenge length)  (expansion factor)  in a single round  
3  2.5  8.45  25  22 
4  4.38  9.53  19  18 
5  7.88  9.46  15  15 
10  180.43  7.87  8  7 
20  131,460.7  (?)  4  N/A 
Table 5.1: , , , , and the most pairs needed in a single round
in experiments with our STML solver with
Up to , our solver can find a solution to an unknown STML key with expected challengeresponse pairs in a very small amount of time (the longest round took about 5 minutes). However, due to the exponential growth of the tree that our solution must search, a user can make it quite a bit harder for our algorithm with a relatively small increase in his own computation. For example, if the user simply doubles , he will now have to spend a little over 20 seconds per response, rather than 10. By contrast, our algorithm would have to bruteforce attack a space that is approximately at least half a million times larger, probably much more than that. Indeed, even for a large fraction of rounds with , our solver couldn’t break the schema even once despite several days of runtime. It is immediately apparent that a better technique is necessary for the adversary to keep up with this growth.
11 Future Considerations
The existence of a simple, humanusable schema with such a large expansion factor appears to indicate one of two things: either STML is unbreakable by a modern computer in , or there is some way to eliminate branches of the constraint tree via a heuristic that does not require traversing deep into the tree. We wrote two separate solvers to break STML: the first iterates through the possible constraint combinations for each individual challengeresponse pair and then adds them to a constraint system for a solver to use. This technique utilizes the branch and bound method, but updating the constraint solver object is much too slow to be done so many times. The second solver uses simple enumeration and elimination of possible mappings and successfully broke STML for . Unfortunately, this method uses quite a bit of memory , and also it has no possible way of looking ahead to future branches. As such, it performs worse for larger challenges.
We imagine that a joint solver that takes advantage of each of these two approaches’ strengths might be able to do significantly better. That is, our hypothetical solver would quickly eliminate possibilities like the lightweight solver, but might use some heuristic defined by the constraint solver to determine which branches it should consider first, with the hopes of increasing its chances of finding the solution quickly or being able to prune closer to the root of the tree.
At the same time, it is quite likely that there are humanusable schemas with significantly larger expansion factors than STML. Because the tree of possible constraint combinations grows exponentially in the size of the expansion factor, a schema with a larger —or perhaps just one that hides more information from a constraint solver—could be vastly more difficult to solve. As we work towards creating a faster solver, we hope to continue to push it to its limits with more and more challenging schemas; the end goal being either a humanusable password schema that a computer cannot break with only examples, or a proof that none exists.
References

[1]
L. F. Cranor, “What’s wrong with your pa$$w0rd?” (2014, March), [Video
file]. Retrieved from
https://www.ted.com/talks/
lorrie_faith_cranor_what_s_wrong_with_your_pa_w0rd?language=en.  [2] R. Morris and K. Thompson, “Password security: A case history,” Commun. ACM, vol. 22, no. 11, pp. 594–597, Nov. 1979. [Online]. Available: http://doi.acm.org/10.1145/359168.359172
 [3] N. J. Hopper and M. Blum, Advances in Cryptology — ASIACRYPT 2001: 7th International Conference on the Theory and Application of Cryptology and Information Security Gold Coast, Australia, December 9–13, 2001 Proceedings. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001, ch. Secure Human Identification Protocols, pp. 52–66. [Online]. Available: http://dx.doi.org/10.1007/3540456821_4
 [4] M. Blum and S. Vempala, “Publishable humanly usable secure password creation schemas,” in Proceedings of the Third AAAI Conference on Human Computation and Crowdsourcing, 2015. [Online]. Available: https://www.aaai.org/ocs/index.php/HCOMP/HCOMP15/paper/viewFile/11587/11430
 [5] J. Blocki, M. Blum, and A. Datta, “Human computable passwords,” CoRR, vol. abs/1404.0024, 2014. [Online]. Available: http://arxiv.org/abs/1404.0024
 [6] S. Allison, J. Blocki, and M. Blum, “A secure humancomputable authentication scheme from the kjunta problem,” from personal communication.
 [7] N. J. Driebeek, “An algorithm for the solution of mixed integer programming problems,” Management Science, vol. 12, no. 7, pp. 576–587, 1966. [Online]. Available: http://dx.doi.org/10.1287/mnsc.12.7.576
 [8] M. Benichou, J. M. Gauthier, P. Girodet, G. Hentges, G. Ribiere, and O. Vincent, “Experiments in mixedinteger linear programming,” Mathematical Programming, vol. 1, no. 1, pp. 76–94, December 1971. [Online]. Available: http://dx.doi.org/10.1007/BF01584074
 [9] J. Zhou, “Computational Experiments for Local Search Algorithms for Binary and Mixed Integer Optimization,” Master’s thesis, Massachusetts Institute of Technology, September 2010.
 [10] V. Jain and I. E. Grossmann, “Algorithms for hybrid milp/cp models for a class of optimization problems,” INFORMS J. on Computing, vol. 13, no. 4, pp. 258–276, Sep. 2001. [Online]. Available: http://dx.doi.org/10.1287/ijoc.13.4.258.9733
 [11] C. Timpe, “Solving planning and scheduling problems with combined integer and constraint programming,” Operations ResearchSpektrum, vol. 24, pp. 431–448, October 2002.
 [12] “IBM guidelines for estimating cplex memory requirements based on problem size,” https://www01.ibm.com/support/docview.wss?uid=swg21399933, accessed: 20160426.