“There are two kinds of cryptography in this world: cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files.” — Bruce Schneier
Dutch cryptographer Auguste Kerckhoffs once said that a system should be secure even if everything is known about the system other than its key. In the case of the internet, with web applications serving as the system, passwords have become the new key. It’s not a stretch to say that your most valuable information is probably hidden across all your social media, cloud storage, digital banking and so forth. So, the first and final line of defence between you and digital burglars potentially highjacking your life is none other than your password.
News publications all go bananas over security breaches, yet security is often the most daunting task for beginners in back-end development. I’ve been developing back end systems for a few years now and it still makes me uneasy. So, this guide aims not to provide backend code, but to provide new developers a skeleton for good password systems.
So, you probably understand the basics of password usage:
But how does Step 2 happen? How is the password stored securely?
The foundation of Step 2 in any good password system lies in its use of hashing algorithms. In the real world, we don’t just store the password in our database. We store a representation of the password called a hash. So what is a hashing algorithm?
A hashing algorithm is a function that when inputted a string of arbitrary size will produce a fixed length string called a hash. More simply, a hashing algorithm is a one-way function that converts any text of any size into text with the exact same length each time.
hash(“password”)= 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
hash(“p4ssword”)= eca8e05d94c236e78c389e15e1cad71ff9326bdfa5e1d79d92766f38414e66e5
// 1 different character = completely different hashes
hash(“password”)= 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
// same input string = same output hash (every time)
This is made possible by altering and construing the input text until it does not even remotely resemble the original string. For example, the commonly used SHA-256 algorithm always produces outputs of 256 bits for a total of 2²⁵⁶ possible outputs. As a result, it’s radically difficult to create collisions — when two different input strings produce the same output. In fact, there is a chance of 1 in over 115 quattuorvigintillion (that’s a real number; it’s 78 digits long) to produce a collision.
Instead of storing a plaintext password in a database when a new user is created, a hash of the password is stored instead using a hashing algorithm. Whenever that user later logs in, whatever password they log in with is hashed and compared with the original password’s hash to verify the user. This is because even if a hacker was able to break into a database and view its contents (which happens more often than you’d think), a hash effectively hides every password even if the hacker knew which hashing algorithm was used.
Kerckhoffs would be proud.
Some things to remember:
1. A hashing algorithm will produce the same output for the same input.2. A slightly different input will produce a completely different output.3. A good hashing algorithm minimizes collisions.
Or just use bcrypt (covered in final section).
"cats_name_birth_year" = bad password
“1Q8w$BSLsC” = good password
First, what makes a secure password?
The best passwords aren’t necessarily the ones that include both numbers, capital and lowercase letters, and special characters. Neither are passwords over 8–10 characters. Instead, what these web guidelines do is encourage users to expand variation in passwords to (hopefully) create a one-of-a-kind password. The best passwords are the ones that nobody has ever used before. Here’s why.
Let’s do some quick math.
Using 10 characters, your standard 26 lowercase characters will give you the option of 2⁶¹⁰ possible passwords. That’s a big number — 141,167,095,653,376 to be precise. However, we also have 26 uppercase letters at our disposal too (52 possible characters), which multiplies our total password count by 1000. When you add special characters and numbers our total password count (barring conditions) is 59,873,693,923,837,890,000.
That number is too astronomically large to even wrap your head around. A more understandable metric would be how long it takes to crack a given password. Using a raw SHA256 hashing algorithm, it would take about 738 thousand years to brute force the entire key space.
Seems secure enough? Yeah, right.
Sort of relevant XKCD
A hacker’s best friend. (photo credits: kedeleducation)
Even though it’s impossible to reverse engineer a given hash, we have standardized hashing algorithms like SHA.
This means that hackers can compile huge datasets known as dictionaries of common passwords and hash each password themselves to see if any of the corresponding hashes equal the original password hash. Another technique is to precompute the hashes of a dictionary and place them into a data-structure known as a lookup table or rainbow table, making it EVEN easier to determine the original password. Since large numbers of hashes can be computed only once and reused, these types of hacks can be efficient.
The best way to combat this method of hacking is to use salts: additional random characters added to the password when hashing. Each password requires a different salt, but the plaintext salt can be stored right alongside the password. How is this secure?
Salting protects your passwords because salting renders all the hashes stored in a lookup table useless, since using a salt will completely change the hash. Hackers ultimately will have to resort to brute-forcing passwords with the given salt. Your passwords are (more or less) safe again.
hash(“password”) = 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8
hash(“password” + “salt”) 7a37b85c8918eac19a9089c0fa5a2ab4dce3f90528dcdeec108b23ddf3607b99
// completely different hash
So can you just take SHA256 implementation, throw in a secure salt, and call it a day? Just hold your horses for a second.
Turns out you can’t just roll your own authentication. When the only limit to cracking your password is how fast a GPU can run (and GPUs get faster every year), your security system will start to resemble a ticking time bomb. To combat this, most modern authentication systems use a technique called key stretching.
(Use this metaphor: a stretched out key takes longer to fit into a hole to check if it’s the right key)
How key stretching works is beyond the scope of this guide, but essentially the technique makes it extremely difficult for hackers to test out a bunch of hashes in a short amount of time by slowing down the hashing algorithm and forcing hackers to wait (relatively) long times before a hash is produced.
A popular implementation of key stretching is used in bcrypt, a password hashing library that has been developed specifically for passwords in 1999. In a world where technology quickly outpaces itself, the age of bcrypt is a testament to how powerful it is. The library is available in every language conceivable, from node.js to C and even generates a cryptographically secure salt to prevent rainbow attacks. Useful, right?
hash = bcrypt.hash(“passw0rd”, salt_length)
// cue Staples Easy Button
Use libraries like bcrypt, use 2-factor authentication, follow emerging trends in network security, and penetrate test frequently.
If you learned anything from this guide, I hope its this: never roll your own authentication.
Your security isn’t just competing with hackers — it’s competing with other security systems since hackers can effortlessly apply the same hacking techniques from other attempts to your system.
Remember this key lesson, and you should know enough about security systems to get started. Good luck! 🙌
elore_intelligent machines and the intelligent monkeys behind them_medium.com
If you like crash courses in interesting computer science topics, check out elore! We make cool articles like this about topics ranging from machine learning to hackathons. 🐒