The Do’s and Don’ts of Writing Crypto Code

Writing cryptographic software or adding encryption to an app is an undertaking with numerous pitfalls for a first-timer. And for those already experienced in dealing with crypto matters, simple carelessness or self-assurance can lead to catastrophic results.

In this article we’ve compiled a list of the most common or especially dangerous mistakes developers make while implementing cryptography in their software, the things to look out for, and the things to do to avoid them (best cryptographic practices). Some of them fit even broader risk surface every app has, not only ones that manipulate cryptographic material and sensitive data.

Top Mistakes The Developers Who Deal with Crypto Make

The most common mistakes (in the opinion we hold at Cossack Labs) listed here are not directly linked to cryptographic processes and encryption, but making those mistakes renders cryptography useless at best, leads to vulnerabilities, opens up the system to exploits, and may lead to malfunctioning or DoS. A little attentiveness goes a long way, and the following practices should serve as a check-list of what not to be doing in your software and where it way lead if you do.

Buffering copy without checking the input size (classic buffer overflow)

When you try to cram in more data than a container can hold, you’re going to create a mess. Copying an untrusted input without checking its size would be a classic example of a mistake leading to a buffer overflow.

Buffer overflows often can be used to execute arbitrary code, which is usually outside of the scope of a program’s implicit security policy. This can often be used to subvert any other security service. And buffer overflows lead to crashes and open up the program for malicious external actions such as DoS or putting the program into an infinite loop.

Prevention/Mitigation:

Double check that the buffer size corresponds to the one you specified.
When using functions that accept a number of bytes to copy (such as strncpy()), remember that if the destination buffer size equals the source buffer size, it may not NULL-terminate the string;
Check the buffer boundaries when accessing the buffer in a loop. Also make sure you are not writing past the allocated space;
For any security checks that are performed on the client side, make sure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying the values after the checks have been performed or by changing the client to remove the client-side checks entirely.

Accessing buffer with incorrect length value

Another example of a simple mistake that can lead to buffer overflow and its consequences. This happens when the software uses a sequential operation to read or write a buffer, but uses an incorrect length value that causes it to access memory outside of the bounds of the buffer. When the length value exceeds the size of the destination, a buffer overflow can occur.

Prevention/Mitigation:

Use prevention/mitigation rules from classic buffer overflow vulnerability.

Using potentially dangerous functions

A programmer’s toolbox is chock-full of such digital ‘power tools’ that should be handled with care, including with libraries or API functions that make assumptions about how they will be used, with no guarantees of safety if they are abused. When potentially dangerous functions are not used properly, things can get very messy really quick. For instance, the following functions can become dangerous when used improperly:

usage of non-random IV with CBC mode of a block cipher like AES,
usage of insufficient entropy / small / same / predictable seed for PRNG,
usage of cryptographically weak PRNG.

Prevention/Mitigation:

Identify the list of prohibited API functions and forbid the developers (or yourself) to use these functions (sometimes you’ll have to come up with safer alternatives). In some cases, automatic code analysis tools or the compiler can be instructed to spot the use of prohibited functions, such as the “banned.h” include file from Microsoft’s SDL.

Rolling your own crypto

It may be tempting to develop your own encryption scheme in the hopes of making it difficult for the attackers to crack. However, such homegrown cryptography is a “welcome” sign for potential attackers.

Prevention/Mitigation:

Select a well-vetted algorithm approved and recommended by cryptography experts, and select well-tested implementations (the source code should be available for analysis). We might be biased, but recommend using Themis for encryption as it is a well-tested modular Apache 2 licensed open-source crypto library that currently uses OpenSSL as a source of its crypto primitives.

Incorrectly calculating the buffer size

In languages where memory management is the programmer’s responsibility (such as C), there are many opportunities for making a mistake. If the buffer size is calculated incorrectly, the buffer may be too small to contain the data that the programmer intends to write, even if the input was properly validated. Any number of problems could lead to an incorrect calculation, but in the end you’re going to run head-first into a buffer overflow.

Prevention/Mitigation:

If you allocate a buffer for the purpose of transforming, converting, or encoding an input, make sure that you allocate enough memory for handling the largest possible encoding. For example, in a routine that converts “&” characters to “&” for HTML entity encoding, you will need an output buffer that is at least 5 times as large as the input buffer;
Pay close attention to the byte size discrepancies, precision, signed/unsigned distinctions, truncation, conversion, and casting between types, “not-a-number” calculations, and how your language handles the numbers that are too large or too small for its underlying representation. Remember about the 32-bit, 64-bit, and other potential differences that may affect the numeric representation;
When processing structured incoming data that contains a size field followed by raw data, make sure that you identify and resolve any inconsistencies between the size field and the actual data size;
When allocating memory that uses sentinels to mark the end of a data structure (such as NULL bytes in strings), make sure you also include the sentinel in your calculation of the total amount of memory that must be allocated;
Use sizeof() on the appropriate data type to avoid CWE-467 (“Use of sizeof() on a Pointer Type” error);
Examine compiler warnings closely and eliminate problems with potential security implications, such as signed / unsigned mismatch in memory operations or the use of uninitialized variables. Even if the weakness is rarely exploitable, a single failure may lead to compromisation of the entire system.

Improper validation of array index

Another common mistake is allowing the product to use untrusted input when calculating or using an array index. When that product doesn’t validate (or validates incorrectly) the index to verify if the index references a valid position within the array, it leads to unpleasant consequences.

Using an index outside of the array bounds will very likely result in the corruption of relevant memory and — perhaps — instructions, leading to a crash if the values are outside of the valid memory area.
If it is the data that is corrupt, the system will continue to function with improper values.
Using an index outside the bounds of an array can also trigger out-of-bounds read or write operations or operations on the wrong objects. This may result in exposure or modification of sensitive data.
If an attacker can effectively control memory, it may be possible to execute arbitrary code (as with a standard buffer overflow) even without using large inputs if a precise index can be controlled.
A single fault leads to an overflow (CWE-788) or underflow (CWE-786) of the array index. The consequences will depend on the type of operation performed out of bounds, but they can expose sensitive information, cause a system crash, or possibly lead to execution of arbitrary code.

Prevention/Mitigation:

Use an input validation framework such as Struts or the OWASP ESAPI Validation API (i.e. ESAPI enables you to use ESAPI::getValidator()->getValidInput( … )). If you use Struts, be mindful of the weaknesses covered in the CWE-101.
Be especially careful in validation of all input when invoking code that crosses language boundaries (i.e. from an interpreted language to native code) as this could create an unexpected interaction between these boundaries.
Make sure that you are not violating any of the expectations of the language with which you are interfacing. For example, even though Java may not be susceptible to buffer overflows, providing a large argument in a call to the native code might still trigger an overflow.

Uncontrolled format string

All successful relationships depend on clear communication — this is also true for software. Format strings are often used for sending/receiving well-formed data. By controlling a format string, the attacker can control the input or output in unexpected ways and sometimes even execute code.

Prevention/Mitigation:

Make sure that all format string functions are passed a static string which cannot be controlled by the user and that the proper number of arguments is always sent to that function, too. If at all possible, use functions that don’t support the %n operator in format strings.
Pay attention to the warnings of compilers and linkers — they may alert you to improper usage.

Integer overflow or wraparound

Integers are not Chuck Norris, so they have their limits. And machines can’t count to infinity even if it sometimes feels like they take that long to complete an important task. When programmers forget that computers don’t do Math like people, bad things happen — and that includes anything ranging from faulty price calculations, infinite loops to crashes, etc.

Prevention/Mitigation:

Perform input validation on any numeric input by ensuring that it is within the expected range.
Enforce the rules that make sure the input meets both the minimum and maximum requirements for the expected range.
Use unsigned integers where possible. This makes it easier to perform sanity checks for integer overflows. If you absolutely must use signed integers, make sure that your range check includes minimum values as well as maximum values.
The prevention/mitigation rules for incorrect calculating of buffer size mistake also apply here.

Ignoring compiler warnings

Popular compilers are in development for decades (i.e. the first release of gcc dates back to March 1987), with the help of hundreds of contributors. Which means that in most cases security problems can be caught with compiler warnings. But they often are ignored.

Prevention/Mitigation:

Compile your code using the highest warning level available for your compiler and eliminate warnings by modifying the code.
Use static and dynamic analysis tools to detect and eliminate additional security flaws.

Common Sense Cryptographic Practices to Follow

The previous chapter of this article covered the mistakes unrelated to cryptography that could render any further encryption in your code useless by making your code vulnerable to various attacks. This chapter covers the best general practices for correct and secure implementation of cryptographic tools and approaches in your code.

Avoid using passwords as encryption keys

Using passwords as encryption keys makes them highly vulnerable to keysearch attacks. Most users choose passwords that lack sufficient entropy to resist such attacks.

Solution: Use a truly random encryption/decryption key, not one deterministically generated from a password/passphrase. We recommend using PBKDF2 (which we included in Themis) that uses iterative hashing (along the lines of H(H(H(….H(password)…)))) to slow down a dictionary search.

Use a sufficient number of iterations to make this process take, say, 100ms to generate the key on the user’s machine.

Be careful when concatenating multiple strings before hashing; use combinations of each string’s hash instead

Concatenation leaves the space indication between the two strings ambiguous. For example:

builtin||securely = built||insecurely

car||skill = cars||kill

Put differently, the hash H(S||T) does not uniquely identify the strings S and T. Therefore, the attacker may be able to change the division between the two strings without changing the hash.

For instance, if Alice wanted to send the two strings “builtin” and “securely”, the attacker could change them to strings “built” and “insecurely” without invalidating the hash. Similar problems arise when applying a digital signature or message authentication code to a concatenation of strings.

Rather than using plain concatenation, use encoding that is unambiguously decodable. For instance, instead of computing H(S||T), compute H(length(S)||S||T), where length(S) is a 32-bit value that denotes the length of S in bytes. Another solution would be using H(H(S)||H(T)), or even H(H(S)||T).

Try to avoid using the same key for different operations (i.e. encryption, authentication, signing, etc.)

Using a single key for multiple purposes may open it up for various subtle attacks. Pick a single purpose key and use it for just that one purpose. If you need to be performing both functions, generate two keypairs, one for signing and one for encryption/decryption. Similarly, with symmetric cryptography, you should use one key for encryption and a separate independent key for message authentication. Don’t re-use the same key for both purposes.

Don’t copy private keys, don’t store them as plain text, and don’t hard-code them in your software products

Use the key management principles and guidelines described in our Themis GitHub Wiki.

Keep comprehensive logs and audit trails

Extensive audit logging in every component of the distributed architecture is an important part of key management. Every access to the sensitive data must be logged with details about the function, the user (individual or application), the utilised encryption resources, the data accessed, and when the access took place.

Follow the common sense non-cryptographic data security rules and best practices to prevent non-authorized access to your device with sensitive data

This point is very important even though it leaves the strictly cryptographic plane. We will repeat that using strong cryptography DOES guarantee security against all known theoretical attacks, but it WILL NOT guarantee a high level of complex computer system security against all possible threats of the real world.

For additional educational and fun read on non-cryptographic data security rules, also see our Medium article on non-cryptographic security practices.

This is our take on the subject. If you have something to add to expand the list of mistakes or best practices, please reach out to us via @CossackLabs or email.