An Introduction to Key Derivation Functions: Argon2, Scrypt, and PBKDF2
Bitcoinist, libertarian, atheist, cryptography fan, and founder of http://qvault.io
A key derivation function
, or KDF, derives one or many secret keys from a secret value. Therefore, If you've ever needed to store a password in a database or create a private key from a password, you may have used a KDF.
For instance, some examples of popular KDFs:
Key derivation functions are a must-have for most web applications. The number of sites that use plain-text or weak-hash password storage is frightening. If a site ever emails you a copy of your password... RUN!
Are KDFs just hash functions?
No, but there is overlap. In order to understand KDFs, let's first go through a quick refresher on hash functions.
Some hash functions for example:
A hash function takes an input and creates an output. In most password hashing scenarios it looks something like this:
sha256("password123") -> ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
The function must have the following properties:
- It scrambles data deterministically (Same input, same output)
- No matter the input, the output of a hash function always has the same size
- It cannot retrieve the input from the output (one-way function)
So what is the difference?
There are different types of KDFs. Some are based on stream or block ciphers, but in this article, we will focus on the most common type, hash-based key derivation functions.
As it turns out, all hash-based KDFs are secure hash functions, but not all hash functions are hashed based KDFs.
- Key Stretching
- Key Whitening
- Key Separation
- Key Strengthening
Let's look at each case separately, with the following definition of our general KDF in mind:
derivedKey = keyDerivationFunction(originalKey, salt, difficulty)
Salt is random data used to protect against pre-computation attacks or rainbow tables.
Difficulty can be used to make the KDF slower via intense computation, memory, or parallelism requirements. This protects against brute force attacks because it will take an attacker longer per guess.
Key stretching is the most common use case for the average developer. The idea is to take a key with low-entropy (security or randomness) and stretch it into a longer key that is more secure. Passwords are undoubtedly a great example. For example, many websites use Bcrypt to stretch keys:
passwordForDB = bcrypt(password, salt, difficulty)
KDFs allow child keys to be created from a master key. This can be used in applications like Bitcoin where child keys can control sections of a wallet. However, only the master has full control. This is done through the use of different salts. For example:
childOne = kdf(masterKey, saltOne, difficulty)
childTwo = kdf(masterKey, saltTwo, difficulty)
childThree = kdf(masterKey, saltThree, difficulty)
Strengthing extends a key with a random salt, but then deletes the salt
so it can't be used again. This makes the resulting key stronger without adding significant vulnerabilities to the system.
Should I Use KDFs?
Yes. Most often when storing passwords in databases, but also if any of these other use cases fall into the domain of your code. Tweet at me if you have comments or questions. To read more check out the HKDF paper
Subscribe to get your daily round-up of top tech stories!