Studying the Impact of Managers on Password Strength and Reuse

# Studying the Impact of Managers on Password Strength and Reuse

Sanam Ghorbani Lyastani1, Michael Schilling2, Sascha Fahl3, Sven Bugiel1, Michael Backes4 1CISPA, Saarland University, 2Saarland University, 3Leibniz University Hannover, 4CISPA Helmholtz Center i.G.
###### Abstract

Despite their well-known security problems, passwords are still the incumbent authentication method for virtually all online services. To remedy the situation, end-users are very often referred to password managers as a solution to the password reuse and password weakness problems. However, to date the actual impact of password managers on password security and reuse has not been studied systematically.

## I Introduction

In the past, different solutions have been implemented to help users creating stronger passwords, such as password meters and policies, which are also still subject of active research [42, 53, 18, 46, 68]. Among the most often recommended solutions [29, 58, 52, 61, 55] to these problems for end-users is technical support in form of password management software. Those password managers come as integrated parts of our browsers, as a plugin to our browsers, or as separate applications. Password managers are being recommended as a solution because they fulfill important usability and security aspects at the same time: They store all the users’ passwords so the users do not have to memorize them; they can also help users entering their passwords by automatically filling them into log-in forms; and they can also offer help in creating unique, random passwords. By today, there are several examples for third party password managers that fit this description, such as Lastpass [5], 1Password [1], and even seemingly unrelated security software, such as anti-virus [4] solutions.

##### Outline

The remainder of this paper is organized as follows. Section II gives an overview of related work on studying password security. Section III presents the methodology of our study and data collection. We describe the results and analysis of our collected data in Section IV and then discuss implications and limitations of study in Section V. Finally, we offer some concluding remarks in Section VI.

## Ii Related Work

Text passwords are since decades [45] the incumbent authentication scheme for online services [30, 31], and will very likely remain in that position for the foreseeable future. They distinguish themselves from alternative schemes through their very intuitive usage and easy deployment, however, as well as through a pathological inability of users to create strong passwords that withstand guessing attacks [12]. Given the permanence of passwords, end-users are commonly referred to technical help in form of password management software [29, 58, 52, 55] to create strong, unique passwords.

In this paper, we aim to better understand how password managers help users in this task and to try to measure the impact password managers actually have on the current status quo. We do this through a comprehensive study that includes both self-reported user strategies and factors for password creation and storage as well as in-situ collected password metrics and questionnaire answers. To put our approach into the larger context and to provide necessary background information, in this section we give an overview of prior research on how users select passwords, how users (re-)use passwords, how password strength can be measured, and on dedicated studies of password manager software.

Different works have studied the strategies of users and the factors that influence the selection of new passwords. For instance, users create passwords based on something that has relevance to them or has meaning to them [55], and very often passwords are based on a dictionary word [39, 51].

Password strength has been studied for several years. There are different mechanisms that can be use to measure the strength of passwords. The Shannon entropy [23] equation provides a way to estimate the average minimum number of bits needed to encode a string of symbols, based on the frequency of the symbols. This formula was formerly used by the NIST guidelines [29] to estimate the strength of password based on the length of the passwords. However, more recent research [67, 11, 20, 41] argued that guessability metrics are a more realistic metric than the common used entropy metrics, and recommendation such as NIST [29] recently picked the results of this line of research up and updated their recommendations accordingly. One of the vital insights from this and other research [35] was that passwords are not chosen randomly but exhibit common patterns and are derived from a limited set of dictionary words.

Measuring a password’s guessability has been realized in different ways. Those include Markov models [14, 21], pattern matching plus word mangling rules [68, 66], or neural networks [46]. Since prior password strength meters were based on the password composition and the resulting entropy, those new approaches also found their way into contending password strength meter implementations [68, 46, 59]. However, varying cracking algorithms or techniques can cause varying password strength results based on configuration, methods, or training data [62]. Also in our study we measure the password strength based on guessability, using the openly available zxcvbn [68] tool.

### Ii-D Security and usability of password managers

Password manager software has also been the subject of research. For instance, human-subject studies [40, 15] have shown that password managers might suffer from usability problems and that ordinary users might abstain from using them due to trust issues or because they see a necessity. Like any other software, password manager might also contain vulnerabilities or errors [44, 71] that can compromise user information, and new guidance for developers of password management software were derived. Also the integration of password manager, in particular the password auto-filling, was scrutinized [54, 56] and vulnerabilities discovered that can help an adversary to sniff passwords during the auto-fill process.

## Iii Methodology

For our study of password managers’ impact on password strength and reuse, we use data collected from paid workers of Amazon’s crowd-sourcing service Mechanical Turk. We collected the data in three different stages: 1) an initial survey sampling, 2) collection of in-situ password metrics, and 3) an exit survey. In the following, we describe those three stages in more detail.

##### Ethical concerns

The protocols implemented in those two stages were approved by the ethical review board of the faculty of Mathematics and Computer Science at Saarland University. We also took the strict German data and privacy protection laws into account for collecting, processing, and storing any participant information. Further, we followed the guidelines for academic requesters outlined by MTurk workers [22]. All server-side software (i.e., a LimeSurvey Community Edition software and a self-written server application for our plugin-based data collection) was self-hosted on a maintained and hardened university server. Web access to the server was secured with an SSL certificate issued by the university’s computing center and all further access was restricted to the department’s intranet and only made available to maintainers and collaborating researchers. Participants could leave the study at any time during the two stages.

In our survey sampling, we asked the participants about their general privacy attitude, their attitude towards passwords, their skills and strategies for creating and managing passwords, as well as basic demographic questions. Those information enable us, on the one hand, to gain a general overview of common password creation and storage strategies in the wild. On the other hand, those information help us in detecting and avoiding any potential biases in the later stages of our study. The full survey contained 31–34 questions, depending on conditional questions, categorized in 6 different groups (see Appendix A).

All qualitative answers (e.g., free text answers to Q9 or Q22 in Appendix A) were independently coded in a bottom-up fashion by two researchers. For the coding tasks, the researcher achieved an initial agreement between 95.6% (Q9) and 97.1% (Q22) and all differences could be resolved in agreement.

Participation in the survey was open to any MTurk worker that fulfilled the following criteria, which we copied from MTurk-based studies in psychological research: the worker has to be located in the US and the number of previously approved tasks has to be at least 100 or at least 70% all of tasks. By using MTurk, we ensured that all participants were at least 18 years old. The estimated time for answering the survey was 10–15 minutes and we paid workers $4 for participation. In total, 505 MTurk workers participated in our survey between August 2017 and October 2017. After discarding responses that failed attention test questions [34], were answered too fast to be done thoughtfully, or that were duplicates (e.g., same human worker with different IDs), we ended up with 476 valid responses. Lastly, we also asked whether the participant would be willing to participate in a follow-up study, in which we measure in an anonymized, privacy-protecting fashion the strength and reuse of their passwords. Only participants that indicated interest in the follow-up study were considered potential candidates for our Chrome plugin-based data collection. Only 21 workers were not interested. ### Iii-B Chrome plugin based data collection To collect in-situ data about passwords, including their strength, reuse, entry method, and domain, we created a Chrome browser plugin that monitors the input to password fields of loaded websites and then sends all collected metrics back to our server once the user logs into the loaded website. We distributed our Chrome plugin via the Google Web Store to invited participants. The plugin was unlisted in the Google Web Store, so that only participants to which we sent the link to the plugin store website were able to install it. Our primary selection criterion for participant selection was that they use Chrome as their primary browser and are not using exclusively mobile devices (smartphones and tablets) to browse the web; besides that we aimed for an unbiased sampling from the participants pool with respect to the participants’ privacy attitude, attitude towards passwords, demographics, and usage of password managers. Between September and October 2017, we invited 364 participants from the survey sampling via MTurk to the study, of which 174 started and 170 finished participation. Participants that finished the task were compensated with$20.

#### Composition

The length of the password as well as the frequency of each character class.

## Iv Studying Password Managers’ Impact

In this section, we analyze our collected data, but leave the discussion of the results for Section IV. We start with our participants’ demographics. We then present a brief overview of our participants’ password reuse and strength in general and afterwards introduce a grouping of our participants based on their creation strategy. We then present a short case-study of LastPass users. Finally, we study the impact of different management and creation strategies on the password reuse and strength by exploring correlations between those factors.

### Iv-a Demographics

Table III provides an overview of the demographics of our participants that answered our survey on passwords, that we invited to the plugin-based study, and that participated in the plugin-based data collection. Noticeable is that we invited participants in equal parts from every demographic group and that also every demographic group participated in almost equal parts in the plugin-based data collection. We us a Mann-Whitney rank test [26] to test for significant differences between the demographic distributions of the 476 participants in the survey sampling and the 170 participants in the plugin-based study, and could not find any statistically significant () differences between those two groups. In general, our participants’ demographics are closer to the commonly observed demographics of qualitative studies in university settings than to the demographics of the 2010 US census [63]. Our participants’ number is skewed towards male participants (57.6% identified themselves as male). Also, our participants covered an age range from 18 to more than 70 years, where our sample skews to younger participants (75.2% of our study participants are younger than 40) as can be commonly observed in behavioral research, including password studies and usable security. The majority of our participants had no computer science background (80.88%) and was English speaking (98.3%). Most of the participants identified themselves as of white/Caucasian ethnicity (74.6%). The participants also covered a range of educational levels, where 14.3% are high school graduates, 16.6% having an associate’s degree, 36.6% having completed a Bachelor’s degree, 0.4% having a doctoral degree, and 6.7% having completed a graduate or professional degree. Further, 80.9% of our participants reported using Chrome as their primary browser (see Table IV).

As shown in Table VII, the majority of the 1,767 logged passwords was entered with Chrome’s auto-fill (53.71%) followed by manual entry (33.39%). Although in our pilot study various password manager plugins, e.g., KeePass and 1Password, had been correctly detected, in our actual study only LastPass was used by our participants (128 or 7.24% of all passwords). Copy&paste and unknown Chrome plugins formed the smallest, relevant-sized shares and only four passwords were entered programmatically by an external program.

### Iv-C Grouping based on creation strategy

We grouped our participants based on their self-reported strategies for creating new passwords (see Q9, Q13, and Q15 in Appendix A). Based on their answers, we discovered a dichotomous grouping:

#### Group 2: Human-generated ("Human")

We discovered that all 121 remaining participants described a strategy that abstains from using technical means (like password managers). Almost all of the participants in this group reported that they "try to come up with a (random) combination of numbers, letters, and characters," which prior work has shown to be prone to efficient statistical, data-driven attacks [11, 21, 46, 32]. For instance, one participant symptomatically reported: "I think of a word I want to use and will remember like. mouse. I then decide to capitalize a letter in it like mOuse. I then add a special character to the word like mOuse@. I then decided a few numbers to add like mOuse@84." Only a very small subgroup of seven participants reported using analog tools to create passwords, such as dice or books ("I have a book on my desk I pick a random page number and I use the first letter of the first ten words and put the page number at the end and a period after."), or using passphrases ("i use song lyrics then add a random word at the end").

Many of the participants in this group also hinted in their answer to their password storage strategies. For instance, various participants emphasized ease of remembering as a criteria for new passwords (e.g., "something easy to remember, replace some letters with numbers."; see also Table VIII), others use analog or digital storage (e.g., "I try to remember something easy or I right[sic] it down on my computer and copy&paste it when needed."). Many participants also outright admitted re-using passwords as part of their strategy (e.g., "I use the same password I always use because it has served me well all these years" and "I have several go to words i use and add numbers and symbols that i can remember").

#### Iv-C1 Group demographics

We provide an overview of the groups’ demographics in Table IX. We again used a Mann-Whitney rank test to detect any significant differences in the distributions of those two demographic groups. We find that the two groups have statistically significant different distribution for gender (, ), computer science background (, ), and attitude towards passwords (, ). More participants in group 1 (PWM) identified themselves as male in comparison to group 2 (Human). The fractions of participants that have a computer science background and that are optimistic about passwords are higher in the group of password manager users. Gender and computer science background are significantly correlated for our participants (Fisher’s exact test: , ) as are computer science background and password attitude (chi-square test: , ). One hypothesis for this distribution could be that computer science studies had historically more male students and that their technical background may have induced awareness of the importance of passwords as a security measure and the promised benefits of password managers.

#### Iv-C2 Comparison of password strength and reuse

Figures 7 and 8 provide a comparison of the password strength and reuse between the two participant groups. The hatched bars indicate the overall number of passwords per zxcvbn score and reuse category, respectively. The plain bars break the number of passwords down by entry method. Participants in group 1 (PWM) entered in total 522 passwords and participants in group 2 (Human) entered in total 1245 passwords (both numbers include reused passwords, see Table X).

For password strength (see Figure 7), neither group contained a noticeable fraction of the weakest passwords with score 0. However, group 2 shows a clear tendency towards weaker passwords. For instance, there are almost twice as many score 1 passwords () than score 4 passwords (). In contrast, the most frequent score for group 1 is 2 (), but the distribution shows a lower kurtosis (e.g., scores 1,3, and 4 have the frequencies 126, 113, and 114). When breaking the number of passwords down by their entry method, Chrome auto-fill is the dominating entry method for all zxcvbn scores 1–4 in both groups except for score 1 in group 1, where manually entered passwords are most frequent. However, for group 1 the fraction of passwords entered with LastPass’ plugin ( or 17.82% of the passwords) is considerably larger than for group 2 ( or 2.81%). In particular, for group 1, passwords entered with LastPass have mostly scores higher than 2 (), where score 4 is the most frequent ().

Regarding password reuse (see Figure 8), the most frequent category is exactly-and-partially reused ( or 36.21% for group 1; or 44.58% for group 2). However, group 1 shows a bimodal distribution in which not-reused passwords are almost as frequent () as exactly-and-partially reused ones. Further, Chrome auto-fill is the dominating entry method for all reuse categories in both groups. However, when breaking the passwords down by entry method, more than half ( or 52.69%) of the passwords entered with LastPass in group 1 have not been reused in any way. The vast majority of reused passwords can be attributed to manual entry and Chrome auto-fill. In group 1, 335 or 64.18% of the passwords have been reused and in group 2 979 or 78.63% of the passwords. Of the 335 reused passwords in group 1, 278 or 82.99% passwords have been entered manually or with Chrome auto-fill. In group 2, 926 or 74.38% of the reused passwords were entered manually or with Chrome auto-fill.

### Iv-D Case-study: Active LastPass users

We were interested in how consistent users of password managers employ their tools during their normal web browsing. Our dataset contains 15 users that entered at least one password with a known password manager plugin and for which we are hence certain that they are users of a password manager solution (e.g., we cannot be certain about users that copy&paste all their passwords from a manager into the password fields). All of those 15 users employ LastPass as manager. Figure 9 gives an overview of those 15 users’ password properties.

We can observe that, except for user 3, all users entered passwords through at least one additional entry method, most even two methods, however, LastPass’ plugin is the primary entry method for 10 of the 15 users and on average 52% of the passwords in this selection were entered through LastPass (SD=31%). Interestingly, user 3 gave no indication in the survey sampling for using a password manager and hence is in group 2 (Human), but was the only user to enter all passwords through LastPass. Of the 15 users, four users did not enter any strong password with zxcvbn score 4 and every user entered at least one password with zxcvbn score smaller than 4 through LastPass. Nevertheless, the mean zxcvbn score (mean=2.72, SD=0.58) of this selection of participants is above the global average. All but user 11 reused at least one password either partially, exactly, or both. User 15 even reused all of their passwords. The average user in this selection reused 60% of their passwords. That is below the global average in our dataset.

Although there are some users that seem to particularly benefit from using LastPass (e.g., users 3, 6, 11, and 12 are heavy LastPass users with low reuse and strong passwords), we could neither confirm nor refute a statistically significant correlation between ratio of LastPass passwords per user and either ratio of strong or non-reused passwords, since presuming a small or medium sized effect the number of LastPass users in our dataset is too small for a statistical test with sufficient testpower.

### Iv-E Modelling password strength and reuse

To get a better understanding of the influencing factors for password strength and reuse we conducted several regression analyses that include the effects of the users strategies as well as their password manager usage. To account for the hierarchical structure of our data, where individual password entries are grouped under the corresponding participant, we calculated multi-level (aka hierarchical) logistic- or ordinal-regressions that allow the intercepts to vary at the participant level. By comparing simple and multi-level models for reuse and strength, the significant superiority of the latter was demonstrated. Thus, we report here only the final multi-level regression models and how they were constructed.

#### Iv-E1 Correlation analysis

##### Website category as proxy for website value

Commonly the website category is used as a proxy for the website value. Since we collected both, self-reported website value from the in-situ questionnaire and website category from the domain, we can provide insights into this general assumption. Figure 10 shows the self-reported value per respective domain. For instance, in more than 70% of all logged passwords for a financial domain, the user reported a very high value for that domain. Similarly, in more than 60% of all logged passwords for news websites, the users (strongly) disagreed that this domain has a high value. Unfortunately, domains with an unknown category, which form the bulk of our logged passwords (632 or 35.8% of all logged entries; see Table XI), did not show a clear tendency towards high or low value.

Although prior works [50] used the website category in comparable models, in our regression models, we decided for above stated reasons on using the self-reported website value instead of the website category as a predictor.

#### Iv-E2 Constructing the models

Selecting an appropriate model corresponding to the empirical data is a crucial step in every regression based analysis. This process ensures that only sets of variable are included that significantly explain variation in the empirical data. To this end, for both password reuse and password strength prediction, we started with a base model without any explanatory variables, which we iteratively extended with additional predictors. Tables XII and XIII present the goodness of fit for the relevant steps in this model building process. According to the scale level of our dependent variable and the hierarchical structure of our data, we build an ordinal multi-level regression model for the zxcvbn scores (Table XII) and a logistic multi-level regression model for the password reuse (Table XIII). To verify that a multi-level approach suits our data better than a simple regression model we first tested basic multi-level models for password reuse and password strength without any explanatory variables against the corresponding simple regression models. Both multi-level models fitted the data significantly better.

##### Process of model fitting

Throughout the main process of model selection, we extended both multi-level models in three steps by adding sets of predictors. In three steps we included: 1) variables measured at the login level: a) the entry method of the password, b) the self-reported website value and c) the self-reported password strength; 2) variables measured at the user level: a) the number of submitted passwords per user, b) the password creation strategy of the user and c) the self-reported password management strategies of the user; 3) the cross level interactions between the user’s password creation strategy an the detected entry method.

This approach not only allows us to evaluate the effects of the individual explanatory variables, but also to investigate the interplay between different storage strategies and the password creation strategy of the users. In each iteration we computed the model fit and used log likelihood model fit comparison to check whether the new, more complex model fit the data significantly better than the previous one. As our final model we picked the one with the best fit that was significant better in explaining the empirical data than the previous models. This is a well established procedure for model building in, e.g., social sciences and psychological research [33, 26, 10, 16], and allows creation of models that have the best trade-off of complexity, stability, and fitness.

##### Selecting the appropriate model

All models are compared according to the corresponding akaike information criterion (AIC), which is an estimator of the relative quality of statistical models for a given set of data. Smaller AIC scores indicate a better fitting model. Additionally, the models are statistically compared using likelihood-ratio tests, which are evaluated using a Chi-squared distribution. The final model is selected based on AIC as well as their ability to describe the empirical data better than the previous models.

#### Iv-E3 Model for zxcvbn score

For the zxcvbn score an ordinal model with all predictors and also the mentioned interaction described our data best. The model is presented in Table XIV.

The interactions between the self-reported password creation strategy (q9:generator; see Q9 in Appendix A) and the detected entry methods Chrome auto-fill, copy&paste, and LastPass were significant predictors in our model. Those entry methods and also the creation strategy are not significant predictors of password strength on their own. This means that using such a password management/entry tool only leads to significant improvement in the password strength when the users also employ some supporting techniques (password generator) for the creation of their passwords.

The model might suggest that a general password entry with a plugin (other than LastPass in our dataset) increased the likelihood of a strong password. However, this could be attributed to the high standard error resulting from the minimal data for this entry method.

Moreover the self-reported password strength was a significant predictor of the measured password strength. This indicates that the users may have a very clear view on the strength of the passwords they have entered.

#### Iv-E4 Model for password reuse

In addition, we found a positive relation between the the numbers of passwords entered by users and the reuse of these passwords. In our model, each additional password of the user increases the chance that it will be reused by 6% (odds ratio 1.06). This suggests that with increasing numbers of passwords, it becomes more likely that some of them will be reused, which is in line with prior research [28].

We also found the self-reported value and password strength of users a statistically significant predictor for reuse [9]. Passwords entered to a website with a higher value for the user were less likely to be reused (odds ratio of 0.87) and also passwords that the users considered stronger were less likely to be reused (odds ratio of 0.81).

Lastly, users that reported using an analog password storage (q14:analog; see Q14 in Appendix A) were less likely to reuse their passwords (odds ratio of 0.62).

## V Discussion

We discuss the results and limitations of our study on password managers’ impact on password strength and reuse.

In general, our participants showed very similar password strength and reuse characteristics as in prior studies [50, 65] and our analysis could also reaffirm prior results, such as rampant password reuse and high share of low-strength passwords, and extend them, e.g., when asked in-situ users made very accurate estimates of their passwords’ strength.

For our participants, we discovered a dichotomous distribution of self-reported creation strategies. Participants indicated using a password generator right now or in the recent past, or clearly described mental algorithms and similar methods for human-generated passwords; only a negligible fraction of participants mentioned analog tools or alternative strategies (like two-factor authentication). Taking a differentiated view based on the creation strategies, we find that users of a password generator are closer to a desirable situation with stronger, less reused passwords, although being far from ideal.

Our models further suggest, that the use of password generators and the website value also significantly reduced the chance of password reuse. More interestingly, however, is that the password storage strategies have different influence independently of an interaction with the creation strategy. Using a password manager plugin or copy&pasting passwords reduced password reuse, while Chrome’s auto-fill aggravated reuse. In other words, we observed that users were able to manually create more unique passwords when managing their passwords digitally or with a manager, but not with Chrome auto-fill.

The benefit of password managers is also put into better perspective when considering particular strategies in our group 2 (human-generated passwords). We noticed that users tend to have a "self-centered" view when it comes to passwords’ uniqueness (i.e., personal vs. global), but are unaware of the fact that an attacker would not be concerned with personal uniqueness of passwords. A large fraction of users reported to "come up with [a password they] have never used before", to "use some words [they] would be familiar with but other weren’t", or to "try to think of something that [they] have never used before". Those results also align with prior studies of password behavior [55, 51, 39].

Further, in light of the high relevance of copy&paste for strong and unique passwords, our results can also underline the "Cobra effect" [36, 37, 47] of disabling paste functionality for password fields on websites to encourage the use of two-factor authentication or usage of password managers.

### V-B Impressions of password managers

Lastly, we collected from our participants at different points in our survey—survey sampling and exit survey—their impressions and opinions about password managers. Those collected information provide insights into why users abstain from using password managers, to which contexts users restrict usage of managers, and also hint at misunderstood security benefits of managers.

#### Password managers as single point of failure

We also noticed the high amount of users’ distrust into password managers in our participants’ answers in our survey sampling. When asked about their impressions’ of password managers, various participants expressed concerns about managed passwords as a single point of failure. The concerns included software security issues, such as

• "I think that it saves time but also generates a way for hackers to steal the information for themselves."

• "It’s not that secure, if someone managed to hack me or get a virus in they’d get everything stored in there."

• "I would never use my browser’s manager should my computer ever be hacked. I do trust LastPass for my pass word protection and do change my passwords on a regular basis. I keep my LastPass main password written down in a secure place in my home."

• "I can see using the password saving feature of a browser as being convenient, but it leaves the user vulnerable to hackers who find a security breach."

• "I feel that using my browser password saving feature is dangerous so I hate to think about it. If hackers hack just my browser they would have a bunch of passwords. I don’t store them all on there for that reason."

• "I don’t like using the browser saving feature, makes me feel like someone can hack into my browser, like Google and then get to my passwords. I do use it though for non-security sites, like signing up for emails from an on-line store, but not for anything that needs stronger security. I like some of the password managers, but I feel like I am not utilizing them efficiently and also feel like they could still be hacked. I am probably just paranoid. Still makes me nervous, though."

• "I try not to use the save password feature because of an exploit found not too long ago that allowed people to steal your passwords by using that feature. Lastpass is very easy to use and is a good way to store the passwords and it also lets you generate new passwords that are safe to use. It makes it so that you don’t have to try and come up with your own hand crafted passwords."

• "I don’t like using the browser to store them because I use the same browser across devices and wouldn’t want someone to be able to get my passwords if my phone was stolen. I like using LastPass because it has a password protecting all of the other passwords. It feels safer than just saving them with the browser."

• "I don’t feel like it is the most secure. What if someone steals my laptop? But, I use it anyway, because the reward outweighs the risk. Most likely no one will steal my laptop. And I do not carry it around with me, it stays at home."

• "My impression is fairly good using the browser. Unless my computer or phone is stolen and then hacked I’m pretty safe."

• "It’s probably not the safest thing in the world but I don’t think my laptop will get stolen and no one else has access to my laptop soo…"

• "I think for personal use, using a browser’s password saving feature is fine. It’s true that someone could steal the computer and get in, but I consider that to be an acceptable risk in the name of ease of use. […]"

• "I use it to automatically save some of my passwords on some sites so that I don’t have to manually type it every time. It is useful, and I like it. My only concern is that it could provide unwanted access if my computer were stolen, but given how and where I use it this is fairly unlikely."

• "I guess the browser built in password savers are password managers, but they are not secure. any number of programs can dump the passwords, but they are convenient for loggin [sic] and it does save copying and pasteing [sic]. I am the only one using this computer so I would have to scramble if it were every stolen or compromised"

#### Misunderstood security benefits of password managers

Lastly, we also noticed a very few cases that users attributed password managers security benefits that this software does not offer. For instance, one user noted that a password manager offers protection against password leakage through keyloggers ("Using my browser’s password saving feature is a matter of convenient as it allows me to not have to always type in my password and it’s a feature that I feel is fairly secure. Also, not having to type in a password, allows me to bypass keylogger and other risks that may be associated with frequent use of typing in password."), which is only true in a limited set of scenarios (e.g., compromised keyboard firmware or USB keylogging devices on the keyboard USB port), but does not hold when the end-user device is compromised with malware. In the latter case, malware might steal the password database and log the user’s master password for that database, if set.

### V-C Threats to validity

As with other human-subject and field studies, we cannot eliminate all threats to the validity of our study. We targeted Google Chrome users, which had in general [7] the highest market share and also among our survey participants. Further, we recruited only experienced US workers on Amazon MTurk, which might not be representative for any population or other cultures (external validity), however, our demographics and password statistics show alignment with prior studies. Furthermore, we collected our data in the wild, which yields a high ecological validity and avoids common problems of password lab studies [42], but on the downside does not give control over all variables (internal validity). We asked our participants to behave naturally and also tried to encourage this behavior through transparency, availability, and above average payment, however, like closest related work [65, 50] we cannot exclude that some participants behaved unusually.

## References

• [2] “Alexa Web Information Service: Developer Guide (API Version 2005-07-11),” https://docs.aws.amazon.com/AlexaWebInfoService/latest/.
• [3] “Github: dropbox/zxcvbn,” https://github.com/dropbox/zxcvbn.
• [5] “LastPass,” https://www.lastpass.com.
• [6] “pwgen(1) - linux man page,” https://linux.die.net/man/1/pwgen.
• [7] “W3Counter: Browser & Platform Market Share (November 2017),” https://www.w3counter.com/globalstats.php.
• [9] D. V. Bailey, M. Dürmuth, and C. Paar, “Statistics on password re-use and adaptive strength for financial accounts,” in Proc. 9th International Conference on Security and Cryptography for Networks (SCN’14), 2014.
• [10] D. Bates, M. Maechler, B. Bolker, S. Walker, R. H. B. Christensen, H. Singmann, B. Dai, G. Grothendieck, and P. Green, “lme4: Linear mixed-effects models using ’eigen’ and s4,” https://cran.r-project.org/web/packages/lme4/index.html.
• [11] J. Bonneau, “The science of guessing: Analyzing an anonymized corpus of 70 million passwords,” in Proc. 33rd IEEE Symposium on Security and Privacy (SP ’12).   IEEE Computer Society, 2012.
• [12] J. Bonneau, C. Herley, P. C. v. Oorschot, and F. Stajano, “The quest to replace passwords: A framework for comparative evaluation of web authentication schemes,” in Proc. 33rd IEEE Symposium on Security and Privacy (SP ’12).   IEEE Computer Society, 2012.
• [13] J. Bonneau and S. Preibusch, “The password thicket: technical and market failures in human authentication on the web,” in 9th Workshop on the Economocs of Info Security (WEIS’10), 2010.
• [14] C. Castelluccia, M. Dürmuth, and D. Perito, “Adaptive password-strength meters from markov models,” in Proc. 19th Annual Network and Distributed System Security Symposium (NDSS ’12).   The Internet Society, 2012.
• [15] S. Chiasson, P. C. van Oorschot, and R. Biddle, “A usability study and critique of two password managers,” in Proc. 15th USENIX Security Symposium (SEC ’06).   USENIX Association, 2006.
• [16] R. H. B. Christensen, “ordinal: Regression models for ordinal data,” https://cran.r-project.org/web/packages/ordinal/index.html.
• [17] A. Das, J. Bonneau, M. Caesar, N. Borisov, and X. Wang, “The tangled web of password reuse,” in Proc. 21th Annual Network and Distributed System Security Symposium (NDSS ’14).   The Internet Society, 2014.
• [18] X. de Carné de Carnavalet and M. Mannan, “From very weak to very strong: Analyzing password-strength meters,” in Proc. 21th Annual Network and Distributed System Security Symposium (NDSS ’14).   The Internet Society, 2014.
• [19] X. de Carné de Carnavalet and M. Mannan, “From very weak to very strong: Analyzing password-strength meters,” in Proc. 21th Annual Network and Distributed System Security Symposium (NDSS ’14).   The Internet Society, 2014.
• [20] M. Dell’Amico, P. Michiardi, and Y. Roudier, “Password strength: An empirical analysis,” in Proc. 29th Conference on Information Communications (INFOCOM’10).   IEEE Press, 2010.
• [21] M. Dürmuth, F. Angelstorf, C. Castelluccia, D. Perito, and A. Chaabane, “Omen: Faster password guessing using an ordered markov enumerator,” in Proc. 7th International Symposium on Engineering Secure Software and Systems (ESSoS 2015).   Springer, 2015.
• [22] Dynamo Wiki, “Guidelines for academic requesters (version 2.0),” http://wiki.wearedynamo.org/index.php/Guidelines_for_Academic_Requesters, last visited: 11/10/17.
• [23] C. E. Shannon, “Prediction and entropy of printed english,” 1951.
• [24] S. Egelman, A. Sotirakopoulos, I. Muslukhov, K. Beznosov, and C. Herley, “Does my password go up to eleven?: The impact of password meters on password selection,” in Proc. SIGCHI Conference on Human Factors in Computing Systems (CHI’13).   ACM, 2013.
• [25] S. Fahl, M. Harbach, Y. Acar, and M. Smith, “On the ecological validity of a password study,” in Proc. 9th Symposium on Usable Privacy and Security (SOUPS’13).   ACM, 2013.
• [26] A. Field and J. Miles, Discovering Statistics Using R.   Sage Publications Ltd., 5 2012.
• [27] D. Florencio and C. Herley, “A large-scale study of web password habits,” in Proc. 16th International Conference on World Wide Web (WWW’07).   ACM, 2007.
• [28] S. Gaw and E. W. Felten, “Password management strategies for online accounts,” in Proc. 2nd Symposium on Usable Privacy and Security (SOUPS’06).   ACM, 2006.
• [29] P. A. Grassi, J. L. Fenton, E. M. Newton, R. A. Perlner, A. R. Regenscheid, W. E. Burr, and J. P. Richer, “NIST SP800–63B: Digital authentication guideline (Authentication and Lifecycle Management),” June 2017, last visited: 10/11/17. [Online]. Available: https://doi.org/10.6028/NIST.SP.800-63b
• [30] E. Hayashi and J. Hong, “A diary study of password usage in daily life,” in Proc. SIGCHI Conference on Human Factors in Computing Systems (CHI’11).   ACM, 2011.
• [31] C. Herley and P. van Oorschot, “A research agenda acknowledging the persistence of passwords,” IEEE Security and Privacy, vol. 10, no. 1, pp. 28–36, Jan. 2012.
• [32] B. Hitaj, P. Gasti, G. Ateniese, and F. Pérez-Cruz, “Passgan: A deep learning approach for password guessing,” CoRR, vol. abs/1709.00440, 2017. [Online]. Available: http://arxiv.org/abs/1709.00440
• [33] J. J. Hox, M. Moerbeek, and R. van de Schoot, Multilevel Analysis: Techniques and Applications, Third Edition (Quantitative Methodology), 3rd ed.   Quantitative Methodology Series, 9 2017, an optional note.
• [34] J. L. Huang, N. A. Bowling, M. Liu, and Y. Li, “Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions,” Journal of Business and Psychology, vol. 30, no. 2, pp. 299–311, 2015.
• [35] T. Hunt, “The science of password selection,” https://www.troyhunt.com/science-of-password-selection/, Jul. 2011.
• [36] ——, “The "cobra effect" that is disabling paste on password fields,” https://www.troyhunt.com/the-cobra-effect-that-is-disabling/, May 2014.
• [38] ——, “Password reuse, credential stuffing and another billion records in have i been pwned,” https://www.troyhunt.com/password-reuse-credential-stuffing-and-another-1-billion-records-in-have-i-been-pwned/, May 2017.
• [39] P. G. Inglesant and M. A. Sasse, “The true cost of unusable password policies: Password use in the wild,” in Proc. SIGCHI Conference on Human Factors in Computing Systems (CHI’10).   ACM, 2010.
• [40] A. Karole, N. Saxena, and N. Christin, “A comparative usability evaluation of traditional password managers,” in Proceedings of the 13th International Conference on Information Security and Cryptology.   Springer-Verlag, 2011.
• [41] P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas, L. Bauer, N. Christin, L. F. Cranor, and J. Lopez, “Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms,” in Proc. 33rd IEEE Symposium on Security and Privacy (SP ’12).   IEEE Computer Society, 2012.
• [42] S. Komanduri, R. Shay, P. G. Kelley, M. L. Mazurek, L. Bauer, N. Christin, L. F. Cranor, and S. Egelman, “Of passwords and people: Measuring the effect of password-composition policies,” in Proc. SIGCHI Conference on Human Factors in Computing Systems (CHI’11).   ACM, 2011.
• [43] P. Kumaraguru and L. F. Cranor, “Privacy Indexes: A Survey of Westin’s Studies,” Tech. Rep., 2005.
• [44] Z. Li, W. He, D. Akhawe, and D. Song, “The emperor’s new password manager: Security analysis of web-based password managers,” in Proc. 23rd USENIX Security Symposium (SEC’ 14).   USENIX Association, 2014.
• [45] R. McMillan, “The world’s first computer password? it was useless too,” https://www.wired.com/2012/01/computer-password/, 2012.
• [46] W. Melicher, B. Ur, S. M. Segreti, S. Komanduri, L. Bauer, N. Christin, and L. F. Cranor, “Fast, lean, and accurate: Modeling password guessability using neural networks,” in Proc. 24th USENIX Security Symposium (SEC’ 16).   USENIX Association, 2016.
• [47] P. Moore, “Don’t let them paste passwords…” https://paul.reviews/dont-let-them-paste-passwords/, Jul. 2015.
• [48] R. Morris and K. Thompson, “Password security: A case history,” Commun. ACM, vol. 22, no. 11, pp. 594–597, Nov. 1979.
• [49] G. Notoatmodjo and C. Thomborson, “Passwords and perceptions,” in Proc. 7th Australasian Conference on Information Security - Volume 98.   Australian Computer Society, Inc., 2009.
• [50] S. Pearman, J. Thomas, P. E. Naeini, H. Habib, L. Bauer, N. Christin, L. F. Cranor, S. Egelman, and A. Forget, “Let’s go in for a closer look: Observing passwords in their natural habitat,” in Proc. 24th ACM Conference on Computer and Communication Security (CCS ’17).   ACM, 2017.
• [51] C. Rinn, K. Summers, E. Rhodes, J. Virothaisakun, and D. Chisnell, “Password creation strategies across high- and low-literacy web users,” in Proc. 78th ASIS&T Annual Meeting (ASIST’15).   American Society for Information Science, 2015.
• [53] R. Shay, S. Komanduri, P. G. Kelley, P. G. Leon, M. L. Mazurek, L. Bauer, N. Christin, and L. F. Cranor, “Encountering stronger password requirements: User attitudes and behaviors,” in Proc. 6th Symposium on Usable Privacy and Security (SOUPS’10).   ACM, 2010.
• [54] D. Silver, S. Jana, D. Boneh, E. Chen, and C. Jackson, “Password managers: Attacks and defenses,” in Proc. 23rd USENIX Security Symposium (SEC’ 14).   USENIX Association, 2014.
• [55] E. Stobert and R. Biddle, “The password life cycle: User behaviour in managing passwords,” in Proc. 10th Symposium on Usable Privacy and Security (SOUPS’14).   USENIX Association, 2014.
• [56] B. Stock and M. Johns, “Protecting users against xss-based password manager abuse,” in Proc. 9th ACM Symposium on Information, Computer and Communication Security (ASIACCS ’14).   ACM, 2014.
• [57] L. Tam, M. Glassman, and M. Vandenwauver, “The psychology of password management: a tradeoff between security and convenience,” Behaviour & Information Technology, vol. 29, no. 3, pp. 233–244, 2010.
• [58] The University of Chicago – IT Services, “Strengthen your passwords or passphrases and keep them secure,” https://uchicago.service-now.com/it?id=kb_article&kb=KB00015347, Oct. 2017.
• [59] B. Ur, F. Alfieri, M. Aung, L. Bauer, N. Christin, J. Colnago, L. F. Cranor, H. Dixon, P. E. Naeini, H. Habib, N. Johnson, and W. Melicher, “Design and evaluation of a data-driven password meter,” in Proc. SIGCHI Conference on Human Factors in Computing Systems (CHI’17).   ACM, 2017.
• [60] B. Ur, P. G. Kelley, S. Komanduri, J. Lee, M. Maass, M. L. Mazurek, T. Passaro, R. Shay, T. Vidas, L. Bauer, N. Christin, and L. F. Cranor, “How does your password measure up? the effect of strength meters on password creation,” in Proc. 21st USENIX Security Symposium (SEC ’12).   USENIX Association, 2012.
• [61] B. Ur, F. Noma, J. Bees, S. M. Segreti, R. Shay, L. Bauer, N. Christin, and L. F. Cranor, “"i added ’!’ at the end to make it secure": Observing password creation in the lab,” in Proc. 11th Symposium on Usable Privacy and Security (SOUPS’15).   USENIX Association, 2015.
• [62] B. Ur, S. M. Segreti, L. Bauer, N. Christin, L. F. Cranor, S. Komanduri, D. Kurilova, M. L. Mazurek, W. Melicher, and R. Shay, “Measuring real-world accuracies and biases in modeling password guessability,” in Proc. 24th USENIX Security Symposium (SEC’ 15).   USENIX Association, 2015.
• [63] U.S. Census Bureau, “2010 Census National Summary File of Redistricting Data,” https://www.census.gov/2010census/data/, 2011.
• [64] D. Wang and P. Wang, “The emperor’s new password creation policies: An evaluation of leading web services and the effect of role in resisting against online guessing,” in Proc. 20th European Symposium on Research in Computer Security (ESORICS’15).   Springer, 2015.
• [65] R. Wash, E. Rader, R. Berman, and Z. Wellmer, “Understanding password choices: How frequently entered passwords are re-used across websites,” in Proc. 12th Symposium on Usable Privacy and Security (SOUPS’16).   USENIX Association, 2016.
• [66] M. Weir, S. Aggarwal, B. d. Medeiros, and B. Glodek, “Password cracking using probabilistic context-free grammars,” in Proc. 30th IEEE Symposium on Security and Privacy (SP ’09).   IEEE Computer Society, 2009.
• [67] M. Weir, S. Aggarwal, M. Collins, and H. Stern, “Testing metrics for password creation policies by attacking large sets of revealed passwords,” in Proc. 17th ACM Conference on Computer and Communication Security (CCS ’10).   ACM, 2010.
• [68] D. L. Wheeler, “zxcvbn: Low-budget password strength estimation,” in Proc. 24th USENIX Security Symposium (SEC’ 16).   USENIX Association, 2016.
• [69] A. Woodruff, V. Pihur, S. Consolvo, L. Brandimarte, and A. Acquisti, “Would a privacy fundamentalist sell their DNA for $1000…if nothing bad happened as a result? the westin categories, behavioral intentions, and consequences,” in Proc. 10th Symposium on Usable Privacy and Security (SOUPS’14). USENIX Association, 2014. • [70] J. Yan, A. Blackwell, R. Anderson, and A. Grant, “Password memorability and security: Empirical results,” IEEE Security and Privacy, vol. 2, no. 5, pp. 25–31, Sep. 2004. [Online]. Available: http://dx.doi.org/10.1109/MSP.2004.81 • [71] R. Zhao, C. Yue, and K. Sun, “Vulnerability and risk analysis of two commercial browser and cloud based password managers,” ASE Science Journal, vol. 1, no. 4, pp. 1–15, 2013. ## Appendix A Initial Survey Questions Q1: For each of the following statements, how strongly do you agree or disagree? a1: Consumer have lost all control over how personal information is collected and used by companies. a2: Most businesses handle the personal information they collect about consumers in a proper and confidential way. a3: Existing laws and organizational practices provide a reasonable level of protection for consumer privacy today. (i) Strongly disagree, (ii) Somewhat disagree, (iii) Somewhat agree, (iv) Strongly agree Q2: On how many different Internet sites do you have a user account that is secured with a password? (If you are not sure about the number please estimate the number) (FreeText) Q3: Has ever one of your passwords been leaked or been stolen? (i) Yes, (ii) No, (iii) I am not aware of that, (iv) I do not care Q4: How strongly do you agree or disagree:? b1. Passwords are useless, because hackers can steal my data either way. (i) Strongly disagree, (ii) Somewhat disagree, (iii) Somewhat agree, (iv) Strongly agree b2. I don’t care about my passwords’ strength, because I don’t have anything to hide. (i) Strongly disagree, (ii) Somewhat disagree, (iii) Somewhat agree, (iv) Strongly agree Q5: What characterizes in your opinion a strong/secure password? (FreeText) Q6: Please rate the strength of the following passwords? c1. thHisiSaSecUrePassWord c2. Pa$sWordsk123
c3. AiWuutaiveep9j
c4. !@#\$%&̂*()
c5. 12/07/2017
(i) Very weak, (ii) Weak, (iii) Moderate strength, (iv) Strong, (v) Very strong

Q7: I have never used a computer?
(i) I have never, (ii) I do

(i) 5(high ability), (ii) 4, (iii) 3, (iv) 2, (v) 1(low ability)

Q9: How do you proceed if you have to create a new password? (What is your strategy?) (FreeText)

Q10: I try to create secure passwords…..
(i) for all my accounts and websites, (ii) for my email accounts, (iii) for online shopping, (iv) for online booking/reservation, (v) for social networks, (vi) No answer, (vii) Other

Q11: I make a point of changing my passwords on websites that are critical to my privacy every…… (choose the closest match)
(i) Day, (ii) Week, (iii) Two weeks, (iv) Month, (v) 6 month, (vi) Year, (vii) Never, (viii) Other

Q12: Do you use the same password for different email accounts, websites, or devices?
(i) Yes, (ii) No

Q13: Do you use any of the following strategies for creating your password or part of your password, anywhere, at any time in the last year…

Q15: Have you ever used a computer program to generate your passwords?
(i) Yes, (ii) No

Q16: When creating a new password, which do you regard as most important: choosing a password that is easy to remember for future use (ease of remembering) or the password’s security?
(i) Always ease of remembering, (ii) Mostly ease of remembering, (iii) Mostly security, (iv) Always security, (v) Other

Q17: When you create a new password, which of the following factors do you consider? The password ….
(i) does not contain dictionary words, (ii) is in a foreign (non-English) language, (iii) is not related to the site (i.e., the name of the site), (iv) includes numbers, (v) includes special characters (e.g. "&" or "!"), (vi) is at least eight (8) characters long, (vii) None of the above: I didn’t think about it, (viii) No answer, (ix) Other

Q18: My home planet is Earth?
(i) Yes, (ii) No

(i) Yes, (ii) No

Q20: Do you use any kind of extra password manager program (for instance, LastPass, 1Password, Keepass, Dashlane, etc.)?
(i) Yes, (ii) No

Q21: Which password manager(s) do you use? (You can write one name per line) (FreeText)

Q24: I am (i) Female, (ii) Male, (iii) Other, (iv) No answer

Q25: My age group is (i) under 18 years, (ii) 18 to 30, (iii) 31 to 40, (iv) 41 to 50, (v) 51 to 60, (vi) 61 to 70, (vii) 71 or older, (viii) Other

Q26: My native language is (FreeText)

Q27: My primary web browser is (i) Chrome, (ii) Firefox, (iii) Internet explorer/ Edge, (iv) Safari, (v) Opera, (vi) Other

Q28: For browsing websites, I use (i) Almost exclusively my smartphone/tablet, (ii) Mostly my smartphone / tablet, (iii) Almost exclusively my desktop / laptop computer, (iv) Mostly my desktop / laptop computer

Q29: What is the highest degree or level of education you have completed?
(i) Less than high school, (ii) High school graduate (includes equivalency), (iii) Some collage/no degree, (iv) Associate’s degree, (v) Bachelor’s degree, (vi) Ph.D, (vii) Graduate or professional degree, (viii) Other

Q30: Are you majoring in or do you have a degree or job in computer science, computer engineering, information technology, or a related field?
(i) Yes, (ii) No

(i) White/Caucasian, (ii) Black/African American, (iii) Asian, (iv) Hispanic/Latino, (v) Middle Eastern, (vi) Native American/Alaska native, (vii) Native Hawaiian/Pacific Islander, (viii) Multiracial, (ix) Other

## Appendix B Exit Survey Questions

ES1: I do not use any kind of 3rd party password manager, such as 1Password, LastPass, etc,. because?
(i) I do not trust the password manager software or vendor, (ii) of the lack of support for my devices, (iii) of the lack of synchronization between different devices, (iv) I would have to spent money on it, (v) they are not simple to set up and/or not easy to use, (vi) I can manage my passwords myself and a password manager would not provide any additional benefits, (vii) Chrome’s password saving feature suffices for me, (viii) there are too many available managers and I am not sure which one would be right for me, (ix) I never really thought about using a 3rd party password manager or was never interested in them, (x) Other

ES2: Have you ever used any kind of 3rd party password manager in the past, and then stopped using it?
(i) Yes, (ii) No

ES3: Please mention which 3rd party password manager have you used in the past. (FreeText)

ES4: Please shortly explain the reasons why you stopped using the 3rd party password manager. (FreeText)

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters