“Companies spend millions of dollars on firewalls, encryption and secure access devices, and it’s money wasted, because none of these measures address the weakest link in the security chain.”
– Kevin Mitnick
What is the weakest link in the security chain? Humans, of course. It seems that every few weeks another story hits the news about a data breach; leaked by a disgruntled employee, theft of a laptop, or socially engineered access to a company’s database. A few million passwords here, few million passwords there… assume that the super secret P@ssw0rd protecting your sensitive data will be leaked!
With the use of an appropriate hashing strategy we can prevent hackers from accessing passwords, even after a data breach has occurred. I will outline some of the flawed approaches to password storage, then I will describe how you should really do it. (or watch this video from the excellent youtube channel computerphile which inspired this article)
Plain text :-(
Storing passwords in plain text in 2018 is not only grossly incompetent, but highly unethical. We are all aware that users recycle passwords across websites, and even if a website isn’t storing traditionally sensitive data like addresses, or bank details, it doesn’t make security any less important; an email password combination is among the most sensitive pieces data in today’s world.
If a website ever emails you your password, assume your data has been breached and change your password immediately. If you use that password on any other site change it there as well; you should however avoid reusing passwords on multiple sites.
Here are a few notable breaches
- 000webhost 2016–13 million accounts
- Comcast 2015–0.59 million accounts
- Yahoo 2012–0.5 million accounts
What’s that you say? I thought encryption was good, why shouldn’t I encrypt my passwords?
Encryption is a widely used method of obscuring data on the web, in fact, the security of the web largely depends on it. It is not however suitable for storing passwords. Encryption keys are used to obfuscate data in a deterministic and reversible way, the same input always gives the same output.
There is nothing new about this idea. Julius Caesar even had his own cunning method of encryption, the well known ‘Caesar Cipher’. Where each letter of the alphabet is substituted for another to produce a cipher. For example if our encryption key is
3 we get
A -> D
B -> E
C -> F
Obviously modern day encryption methods are vastly superior, but the same principle applies. What can be encrypted, can be decrypted.
There are two main classes of encryption algorithm: Symmetric and Asymmetric. In Symmetric encryption a single key is used to encrypt and decrypt the data. With Asymmetric a pair of keys is used, a public key for encrypting, and a private key for decryption e.g. RSA.
A symmetrically encrypted data set suffers from two main weaknesses. Firstly, since the encryption is deterministic, a hacker may group together batches of matching cipher text and focus their efforts on cracking the most common passwords. In 2013, over 130 million adobe user accounts were affected by this. Secondly, if a hacker obtains access, or figures out your encryption key all is lost.
There exists a reasonably good strategy using asymmetric encryption. An individual private/public key pair can be generated for each user, the private key is discarded and the password is encrypted using the public key. The public key and cipher are then stored in the database. This way every user has a unique encryption key that cannot be recovered. When a user tries to login, the incoming password is encrypted using the stored public key and compared to the stored cipher. The main weakness here is simply that a better solution exists in salted hashes. It’s generally faster to brute force an encryption algorithm than a good password hashing function. See this question on security.stackexchange by user ‘Nonyme’ for a more detailed Q&A on this technique.
Hashing with a sprinkle of salt :-)
We’re getting close. A hash function is a special kind of function to map data to some other data. Cryptographic hash functions are a subset of these specifically designed so that the hash is strictly one way.
For example, using the bcrypt password hashing algorithm:
password -> $2y$10$vRO2G7Ub9Nwdj4vLAJ5HpO6FB5DAvl22IdcslzQ5K6aEbm...
password -> $2y$10$NoHzJFvwhbDBSpfntDmtV.nCZkaDh11Z/.aNzNWjy4Yj6c...
passw0rd -> $2y$10$z.1BPj3C/bPTwIedNZ0fIx1.n2.LnFePpPPjUDekJj/OFi...
p@ssw0rd -> $2y$10$vtLvOI1o.Ht5mzZ1TDcKwOn7XSMwWpkeVGQzP10JB/rkJ6...
Notice that with each single character difference the generated hash is completely different from the last, even hashing the same password twice in a row will produce a different result ($2y$10$ is the bcrypt version, and not really part of the hash). This ensures that and attacker attempting to brute force a password cannot tell how close they are to the correct value.
The only way to authenticate a login attempt is to hash the input credentials, and compare the hash stored in our database. If our credentials database is compromised the attacker will be unable to access the system as there is no way to get `password` back from its hash other than brute forcing every possible string until the same hash is generated.
Along with each hash is a random string of bytes known as a salt, this protects our set of passwords from the batching attack that adobe suffered in 2013. Salting also makes it is also impossible to generate a precomputed lookup table of hashes.
note: the bcrypt package takes care of salting for us so we don’t have to worry about it.
Responsibility of users
Of course none of this really matters if our users pick stupid passwords like
sUp3rMan!. (no offence)
You can use this excellent tool developed by dropbox to get a good estimate of the quality of you password — Realistic password strength estimation.