Ask HN: Is there a good reason for disallowing some characters from a password? I see this restriction quite often, and it makes no sense to me whatsoever. Am I missing a compelling reason for this practise, or is it an example of bad design? |
Ask HN: Is there a good reason for disallowing some characters from a password? I see this restriction quite often, and it makes no sense to me whatsoever. Am I missing a compelling reason for this practise, or is it an example of bad design? |
That said, I did actually run into an instance where having ";-- in your password would trigger the WAF during login and because we needed to ship ASAP the easiest way to get around that was to ban ; in passwords. I don't think we ever went back to fix that one...
This is a misconception. Password length is far more important than allowing a few "tricky" non-alphanumerics. It aids entropy, but it's not some security silver bullet. Also, if the web service you're using is storing undigested passwords then all bets are off.
https://support.1password.com/pbkdf2/
Clocking in at a cracking cost of 79 million USD, for most intents and purposes, even a rather trivial 56-bit entropy password such as "align-caught-boycott-delete" (or "correct horse battery staple", for that matter) would be prohibitively expensive to break.
What system allows you to try 2⁴³ passwords in half a jiffy?
No provider is going to let anyone try that many combinations against a login API, but let's consider the case where the hashes have been captured. Hashcat on a Radeon RX 6650 can test about 30 billion MD5 hashes per second, about 200,000 sha512crypt hashes per second, about 500,000 MacOS PBKDF2 passwords per second, and about 32,000 bcrypt hashes per second.[2][3]
To brute-force the "four random English words" space for a single password, I therefore calculate:
MD5: 333,333 seconds (a little under 4 days)
sha512crypt: 50,000,000,000 seconds (578,703 days, or 1,585 years)
Mac OS PBKDF2: 2,000,0000,000 seconds (231,481 days, or 634 years)
bcrypt: 312,500,000,000 seconds (3,616,898 days, or 9909 years)
No one recommends storing passwords as MD5 hashes anymore, but that's the fastest algorithm Hashcat supports. When using the kind of hash that information security specialists tend to recommend these days, it seems like the XKCD method is still pretty safe. Am I missing something? Did I calculate something incorrectly?
Edit 1: Fixed the figures for sha512crypt.
Edit 2: for the NVidia A100 you mentioned in another branch of this thread, it would be about ten times faster per GPU, but it's still an impractically long time for the modern password hashes unless the adversary has millions of dollars to spend on cracking a high-value account's password.
[1] https://wordcounter.io/blog/how-many-words-are-in-the-englis...
[2] https://hashcat.net/forum/thread-10919.html
[3] It would be slower to handle the four English words case, because AFAIK you'd need to use the wordlist mode instead of straight brute force.
[4] https://gist.github.com/Chick3nman/d65bcd5c137626c0fcb05078b...
Some emoji, for example, are combinations of multiple other emoji, and a given combined emoji may not be uniquely represented by a sequence of codepoints. In the pathological case, this could mean that an OS update on the user's system changes the composition of the same emoji, which might make it impossible for them to input their password. It is probably prudent for a system to disallow emoji passwords.
One step away from Emoji, Unicode also allows for other m̸̱̜̅ͅȋ̴̩̠̀s̸̺͐c̶͈͇͉̐͛̚h̸̤̣̆i̴͍͍͒͌e̴̲̽̓f̸̞̽̊. Chances are, full-on Zalgo passwords can lead to problems. Again, there are probably prudent reasons to restrict some characters. On the other hand, those modifiers exist for a reason, and disallowing phrases in the user's native language doesn't make for great UX.
Towards the more common use of Unicode, there is a pretty good _practical_ reason to restrict the use of some non-ASCII characters: if your system accepts ç, ö and ø as characters in passwords, and non-technical users venture into a part of the world where the keyboard layout doesn't, your helpdesk is going to have to deal with the occasional annoyed customer. From a systems design perspective, those characters seem fine -- operationally, they may cause headaches.
Finally, we've arrived at printable ASCII characters. Restrictions on maximum length (usually 6 or 8 characters), and on certain characters (%, & or :) tend to be based on interactions with legacy systems (e.g. DES crypt() used to have an 8-character minimum), or on bad input handling. Either way, it's probably a bad sign.
I think it took me about five reboots in single-user mode and password resets before something clicked. I wish Ubuntu would not have allowed special characters. :)
So if your password is "password", it will get entered in as "Password" - and the user will get confused why their username/password aren't logging them in.
So a UX pattern is to actually lowercase the first letter on the backend.
While this technically slightly lowers security (they are trying 4 passwords built from the one you typed in), I don't think that's significant, and I imagine it greatly improves user experience.
You have draw the line somewhere and degrading the majority’s experience for the minority’s benefit is an unusual trade-off.
Whatever happened to, “Design for the expert user”?
I don't understand why this would cause an expert user trouble (it's the loss of a single bit of password security, which shouldn't matter if your password is even reasonably decent).
In addition, not all keyboard environments are capable of inputting the same set of emoji. A coworker once got locked out of his macbook because the UX when changing the password when already logged in allowed inputting emoji, but the OS login screen did not (could be misremembering some specifics, but the broad point remains).
Which I suppose is really a subset of the sorts of issues around ç, ö and ø, but how it can happen even on the same system in different contexts.
In general, passwords are not treated as essential for access, and there will be recovery techniques, and the number of password resets or whatever required because they can’t type such-and-such a character any more or on this new device will be a minuscule fraction of the total. Resetting passwords and other forms of lost account access is typically not an exceptional path. From what I’ve seen, for business-to-customer businesses that don’t have some form of self-serve account recovery, “I’ve lost access to my account” will routinely be half your ticket volume.
In those fairly uncommon situations where passwords are essential for access (e.g. where it’s an encryption key), well, it’s still up to the user, and the user is somewhat more likely to be aware of any potential hazards in such fanciness.
Overall, I say: stop trying to be clever; accept what is set before you without asking questions on the grounds of compatibility. Let the user do what they try to do.
Maybe normalise Unicode; it’s a harmless thing to do and has the potential to improve compatibility slightly on very unusual input devices. (I don’t think I’d bother, myself.) But beyond that, I’m not sold on the arguments for restricting possibilities.
However, in my opinion, for real-world systems, you need to strike a balance between technical and operational, and user experience concerns. If restricting your password space to printable ASCII characters can meaningfully decrease the amount of the tickets that generate half of your ticket volume, you should give it some serious thought.
There are good arguments for both approaches, and the right way also depends on your user base. There was a story about WhatsApp a while ago, criticizing that WhatsApp would only notify users when their contacts' security code had changed, whereas Signal (and other secure messengers) would block and ask for confirmation first. Signal currently sits at 100M+ downloads in Play store, WhatsApp sits at 5B+. The numbers are very vague, but WA has 1-2 orders of magnitude more users than Signal.
In the WhatsApp example, a small change in the process can mean that good security becomes accessible to a pool of billions of users, vs. excellent security to millions. Restricting the password character set (to a sensible set of characters, and with a sensible length limit) comes with no security drawbacks, and good chances of some process/usability improvements. For a real-world deployment, I would argue it's very prudent.
Sincere efforts in breaking password hashes is something else than a single individual with one GPU at their disposal - it's not the angry neighbor capturing your Wi-Fi traffic or some "randoms" on the dark web who got their hands on a leaked database.
Realistically you will never need to exhaust the full key space (vocabulary), even if the commonly used set would be as high as 10 000. If you refuse to use a password manager and random character strings for passwords then at least don't settle for just four words, because you'll be going for common and memorable words, not something from the fringes of the dictionary. Unlike the case of a bunch of random characters, when picking a couple of words that you can remember easily there's a psychological factor involved which can be attacked, so make it count.
No security drawback? You make it harder for people to use the password they want to use. There is a real cost to that, encouraging bad password hygiene. Provided you support at least printable ASCII it’s unlikely to be a significant cost, but it is a cost, and I remain entirely unconvinced of the practical benefits of the restriction.
Some sign-up forms don't even give you feedback on which characters are problematic. The Oracle Cloud one kept erroring with "you need one uppercase, one lowercase, and one number" when what it meant to say is "remove that tilde", that took a while to figure out.
I mean, you're not supposed to write down passwords, but with all the various restrictions you can't even use a consistent convention so you can actually remember them all.
You're supposed to use a password manager. Preferably with a passphrase and a second factor like a keyfile or hardware token.
If you do this, you should really save the version of the normalizing table you used, since they change over time.
what are these and why do you need to do it?
This is independent of the Unicode encoding, which turns those codepoints into bytes, for example using UTF-8 this gives C3A9 or 65CC81.
Users don't really have control about what their keyboard/application is putting in the text field when they press the button, and obviously the hash of those is different so the password wouldn't match. Normalization is the process of turning the characters into its composed form (in my example "\u00E9") or the decomposed form ("\u0065\u0301"), so you can then compare your codepoints/bytes/hashes.
https://en.wikipedia.org/wiki/Unicode_equivalence#Normalizat...
When they forbid backslashes and quotes, it's even better: someone didn't know how to use query parameters or escape database values. It's a sign that their software is as secure as a "watch out for the dog" sign.
For a specific example Oracle Database has a very restrictive list of characters allowed in a user password. If you're using Database Users behind the scenes (even if not directly, but via an Oracle integration) you're subject to those same restrictions. Up until Oracle 11g passwords were also limited to 30 characters and a few releases before that were case-insensitive (!).
Is this a good reason? I'd argue, no, but I've worked at tons of organizations where "things that don't make sense" often have an explanation even if it isn't an explanation you're happy with. We should definitely push companies to use cryptographically secure one-way hashing functions with salts, and adjustable difficulty.
The keyboards in the lab were heavily used and was noisy. The space bar, because of its shape, sounded distinctly different from the other keys. I stayed away from the admins when they entered the password like a decent citizen but listened in and found that the password was 7 characters long and also that the second and sixth characters were space (thanks to the different sound of the key). So .˽...˽.
I brute forced this using a shell script (since I has just learned how to write shell script), ran it overnight, and got in the next day.
So yes, I think there might, atleast in theory, be good reasons to avoid certain characters in a password.
It is thus a security Best Practice for streamers and the likes to mute their microphones while typing passwords.
Really, all senses leak information like this. Wifi signals are enough to see round corners and steal passwords. Even wearing a sleeveless shirt and having your upper arms visible to a camera leaks a little information from the small arm and theoretically even muscle movements.
Also, since my password manager types letters one by one, I wouldn't use tabs or line feeds.
Maybe don't use grapheme clusters that have multiple valid encodings and make up for it by using a longer password instead?
Because of that, outlawing the likes of line feed, carriage return and backspace (raw input on a tty will store those in passwords, but good luck entering them in a web form) makes sense, as does normalizing Unicode input (typing ‘é’ on their phone may produce a byte sequence that’s different from typing ‘é’ on their PC)
Apart from that, it should not be necessary. If, however, you don’t trust your programmers to do the right thing, you may want to rule out characters that are related to security incidents such as single quotes, and also may want to prevent users from entering strings that might get decoded to such strings such as ‘"’.
That path can be endless, though. If you forbid ‘&’, because your programmers might accidentally html-decode it, should you guard against double html-decoding? URI-decoding and then uudecoding? Getting programmers you can trust to do the right thing and giving them the time to do so is the better option.
But they're probably just storing it in plaintext on some legacy system that can't handle certain characters. Or the plaintext goes through one of those systems on its way to being hashed and salted.
For characters outside that range, there is a good reason: it's hard to type those characters consistently across different platforms/systems, and they don't want you to lock yourself out over that.
> Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length. Verifiers SHOULD permit subscriber-chosen memorized secrets at least 64 characters in length. All printing ASCII [RFC 20] characters as well as the space character SHOULD be acceptable in memorized secrets. Unicode [ISO/ISC 10646] characters SHOULD be accepted as well.
- Requires quoting or escaping in the shell or some other programming environment
- Hard to type on mobile keyboard.
- Not in a given person's touch-typing repertoire.
The correct way to think about password security is as randomly generating a binary string of the desired security strength/length and then encoding it. If you generate 16 random bytes, that's 128 bits of security whether you encode it with hex, base32 or base64.
Required characters also do little to improve security, since there is usually only 1 of each kind of required character, and it's often at the beginning or end. They don't cause the user to select a random string from a meaningfully larger space.
What I cannot get is sites that make you play 20 questions to figure out their rules instead of just telling you, as in my experience, it leads to lousy passwords that meet only the bare minimum. I seem to recall some popular site (want to say it was AirBnB) which threw an error "password cannot contain name/username" for basically anything it didn't like, regardless of whether the password actually contained that, and it's very annoying.
It was one of the most welcomed changes to the password system at a former work place when I convinced the small team behind the authentication to put the requirements plain and simple and change from red to green as people met the requirements. We also added a passphrase helper that could be summoned if they missed requirements a few times which based on metrics got some fair use.
People generally want to do well by security and it's on their mind, but no one wants to look stupid because they can't think of a password that meets unknown requirements. Make it clear what's expected, and even a nudge towards how to think of good passphrases, and you'll get happy people using your site.
I change my password with something randomly generated by my password manager, and the site accepts it, and as far as I know I'm good to go. Then next time I try to log into the website, it doesn't accept the password it previously (falsely) accepted before, and I have to reset it again and play the guessing game of what special character it didn't like. Madness.
Regarding the null, if it's C based, theoretically your password just stops there. All other chars after that would be ignored.
Now I wonder, what would other non-C languages do if they see 0x00 in a string?
ctrl-H usually works.
Possibly they're preparing for password entry on more ubiquitous devices with limited keyboards? (ATMs, credit card keypads).
Although you should probably not allow "1234" as passwords or anything on the top 100 list for that matter.
I've heard banks and other financial institutions use the "our ancient mainframe only allows 8 characters in account passwords" excuse or "our ancient mainframe database can only handle 8 characters in the password column", and find it extremely hard to believe.
First of all, I find it hard to believe that each customer has a user account on the mainframe, and so the mainframe's restrictions on account passwords is irrelevant. Your banking account is going to be entirely something defined by the database.
Second, I find it hard to believe that they are running their web server on their ancient mainframe OS. The web server is going to be running on something more modern. Users have to go through that to do online banking, and the account system on that can be totally separate from whatever account system is running on the backend banking system. Your user name (if their online banking uses something other than you account number) and you password for online banking should be entirely handled on the Unix or Unix-like or Windows Server that is running their web-facing stuff. The ancient mainframe stuff should never see it.
Why? Do you actually have any experience in this area? I do, and I can tell you, they do exactly that. Then multiple systems integrate with that mainframe, often using the user account as the unique identifier for the entire organization. Migrations are an absolute nightmare.
> Users have to go through that to do online banking, and the account system on that can be totally separate from whatever account system is running on the backend banking system.
It can be, but it isn't. Thus, the problem.
Honestly this type of "hardly believe" take is what every new employee right out of college (or myself 15 years ago) when they come up with ten thousand "simple" ideas for improvement without any organization, political, or system understanding. Then they act confused when their ideas aren't instantly implemented, because they don't even understand what it is they're proposing or why it is complicated.
Banks have been trying to get off of mainframes for 30-years or more at this point, spent tens of millions of dollars, but had someone just told them to "run a web server in front of it" this could all have been avoided.
I am talking about Oracle Database Users and Oracle Database's password limitations therein. The reason for Oracle Database's password restrictions isn't to do with how they're stored on disk (which is secure as if 12c[0]), it is to do with how they were implemented originally (i.e. passwords are implemented as database objects, and database objects have max lengths and other naming rules which apply to passwords).
That is because I want to be always able to easily access these accounts even when traveling or losing access to my technological devices. Though sadly these days things like 2FA make my life much harder in that regard.
No only is SMS generally considered insecure, it also falls down dramatically when I travel to the mountains where there's no cell coverage and try to do things on the cabin wifi.
There are fairly limited scenarios when a password manager is better than a plain text document. And if it's online to actually share passwords between devices it's strictly worse.
How do the multiple systems communicate with each other and with the mainframe?
Then you have multiple half-measure attempts at migration away from the mainframe, so you wind up with a half-built Java layer and a half-built [Large Contractor] bespoke system.
There is something that handles the user's submission from that site and records somehow that the user is logged in, and directs the user's browser to some site that can somehow get their account information, balances, etc, from whatever that is stored, and can handle form submissions that request operations on those accounts such as transfers, bill pay, and such.
What I don't understand is where the difficulty would be in making it so customers do not talk directly to those web-based things, and instead talk to a web front end running on a separate, reasonably modern Unix, Unix-like, or Windows Server and that server talks those older web-based things. It could even talk to them over HTTP looking like a browser to the older components. The online banking front end would not have any direct integration with the rest of their systems. It would just go through the same interfaces they already are using to support online banking.
Heck that is essentially what Plaid does for banks that don't have a useable API. They log in to the bank's online banking site with the customer's credentials and then screen scrape to get the account balances. That has got to be a nightmare for Plaid because they have to deal with many different banks, any of which might make changes that break their scrapping with no notice.
A bank essentially doing its own Plaid that just has to work with its current online banking site should be a lot more doable.