Building Application Backends with End-to-end Encryption

A closer look at the well-documented, but rarely implemented properties of end-to-end encryption.

Intro

Security architectures and trust models are frequently defined and redefined. The Web with its questionable code runtime, virtualised assets, and remote secret storages, constantly introduces new interesting risks and security challenges.

Looking back, it could be safe to say that one of the most overlooked ideas in the realm of building secure systems is an actual understanding that it’s not workflows that should fit the available algorithms, but rather the algorithms should be combined into systems that enable securing real-world scenarios. The greatest impacts of cryptography and advancements in application of extremely abstract algorithms so far have been achieved in the fields that aimed at solving real-world problems, i.e. end-to-end secure messaging.

As the pace of hardware evolution keeps accelerating and the performance penalty for encryption becomes less important, could it be that users and their security problems finally deserve actually fixing the very problem they’re after, not a subset of it?

The resources and methods are there. And ideally we would want security systems that map real-world use-cases to security tooling, on a scale of 1:1.

Why even bother?

The starting point is simple — there are no cryptographic defences, just a bunch of access controls here and there.

Scheme #1: Unprotected infrastructure.

There are two ways of building encryption-based security: the end-to-end encryption (E2EE) approach and the non-end-to-end approach (something that we — for the want of a shorter term — would call the non-E2EE). The first one is historically more focused on communication between the users, as it is intuitively easier to justify building a complex cryptosystem to replicate communication secrecy that exists in the real world. End-to-end encryption provides core security guarantees (the “CIA triad”) between the actual clients and keeps out everyone except for the actual entities directly engaged in a conversation.

In the case of using non-E2EE, only one part of the data life-cycle is protected by cryptographic solution and guarded against the risks visible in this segment. The seams/points where the data is unencrypted are usually the places where risks become threats and threats become breaches.

Scheme #2: Segment-in-a-chain encryption. SSL and data-at-rest protection for the database (with possible “breach points” marked).

Now, let’s take a typical client-server metaphor and track the evolution of its security from non-E2EE to proper end-to-end encryption.

Scheme #3: In-app end-to-end encryption, which still leaves a lot to imagination (there are unfilled resource gaps and the capabilities of a database are also limited).

The first step of such security evolution is making the encryption and decryption take place where the data is actually used, in-application. But this is where the challenges only start.

To share the data between different instances of an application, we need to share the access keys.

To control the access rights to encrypted data we:

— Will either need to build an access control entity on the server storage entity, and need to be able to identify the incoming users and queries, and decide whether they’re eligible for executing a privilege.

— Will implement access rights control cryptographically, via integrating something like attribute-based encryption.

And there are so many things just waiting to be broken along the line, i.e.:

— authentication process,

— access control records,

— access control execution process.

All of this can also be completely bypassed through an attack from outside the access control threat perimeter if no encryption is in place.

No wonder people stick to simpler means of data protection — implementing cryptography seems like a serious undertaking.

What do we need?

A cryptographic method for enforcing access control for data underneath the typical patterns in client-server relationships would be handy.

Why do we need access control in a database? Having a definite access control system in place allows us performing reads and searches on the stored data. Let’s assess the possible known options:

CryptDB allows encrypting the data and execute queries on it afterwards via a combination of known techniques with limited security.
Homomorphic encryption allows you encrypting the data in a special way where the server has the ability to run certain mathematical operations on the encrypted data, which could enable construction of queries and indexing.
ZeroDB et al allows you storing the encrypted data on the server and transmitting the subset of the data every time someone wants to do something with it.

It looks like something is missing here. All these methods focus on solving one problem — how to let people READ and search (because search is a subset of reading, if you think about it) the data securely. But typical database access patterns include much more, don’t they?

At the very least, we want CRUD.

If we can provide (and support) the desired processes cryptographically — while performing all the basic data operations on the client — we can provide the end-to-end protection to not just blobs of data, but to the whole process of data turnover. And then we can build searchable encryption with one of these methods, on top of it.

If we can provide integrity and provable enforcement of access control to the whole CRUD set of permissions, we will have a system the strength of which is level to the strength of the client (which gradually exposes plaintext anyway). Since the attacks on the sensitive data target the clouds/storages first, this considerably lowers the chances of a leak taking place.

The Way to a Solution

1. Attribute-based encryption

Attribute-based encryption is a kind of algorithm of public key cryptography in which the private key that is used for decryption of data is dependent on certain user attributes, such as position, place of residence, or account type.

The idea of attribute encryption was first published in 2005 by Amit Sahai and Brent Waters in the “Fuzzy Identity-Based Encryption” and later developed in a paper by Vipul Goyal, Omkant Pandey, Amit Sahai, and Brent Waters — «Attribute-Based Encryption for Fine-Grained Access Control of Encrypted Data».

Attribute-based encryption enables storage of certain data attributes in encrypted form, with fine-grained access control. The purpose of such encryption is to protect sensitive data (in third party databases) only when it is being stored or exported. The attribute encryption is configured at the database level. After an attribute is encrypted, it is encrypted in every database entry. For search requests to the database, encrypted attributes are decrypted.

The problem with attribute-based encryption is that since it only provides protection for stored attributes, additional security measures need to be implemented for transportation of attributes.

Another issue is that we cannot realistically expect to have a single authority centre for the whole infrastructure, multi-authority attribute-based encryption is more likely to be used in practice — which potentially opens up the system to collusion attacks and makes the users’ privacy more vulnerable.

Also, while attribute-based encryption increases data security, it considerably impacts performance. And when you use attribute-based encryption, you cannot just use binary copy to initialise one server from another — meaning every server in your infrastructure needs to be unique.

2. Access control-based cryptosystem

Cryptography essentially limits the read access to data. However, in more complex cases, the matter goes well beyond the simple read process. Providing read access can be done either by handling a secret key or by adding asymmetric encryption to the equation so that only a recipient with a fitting private key can access it.

The problem is that the first solution is efficient but easy to leak and the second one is less efficient and is also hard to maintain.

But what if we combine them? Let’s have a symmetric key and authentication tag for every record we’re storing. By controlling the secret key, we control reading, but we don’t want to risk an anonymous leakage. What we can do is wrap this key in an asymmetric crypto for others to read.

To authenticate write/update, we can do the following:

enable client to calculate update tag,
compare the update tags to prove that the updater has the right one.

To authenticate the create process, all we need is a parent/child relationship and write/update privilege for the parent entity.

Authentication for delete process would be the same as for update for parent + child (zero-fill the data, remove link from parent).

Another important set of considerations is the question of where to store the crypto ACL and where to store and how to distribute the keys. Following through with the idea of maximum compartmentation, the keys need to be isolated from the data, the access keys need to be isolated from the storage keys; the key discovery must be separated from key storage, and all of these elements must rely in their own security mechanisms.

Scheme #4: Implementing practical end-to-end encryption with access control in place.

3. Search

This still leaves the problem of performing search an open issue. While a lot of advanced modern research on searchable encryption takes place, most of these methods have practical drawbacks. We’ve outlined some approaches above, but there’s one more: limiting the search process to non-sensitive data, or to performing search on tokenized data, preserving the plaintext.

In many cases, this will suffice.

4. Key management

Rotating, revoking keys and user privileges has always been a challenge for cryptographic research. Successful key management is a crucial point for the overall security of an infrastructure, but the very number of operations necessary for a correct and secure functioning of the authorisation mechanism (with the keys that need correct generation, protection, storage, handling, secure deletion, etc.) turns it into a task that’s far from trivial. In a cloud environment things get even more complicated. Although some practical solutions and implementations exist, they’re mainly aimed at the non-E2EE approach.

So, what should we do?

Should we build our own systems or drive the cryptographic industry towards mitigating threats in other segments of sensitive data protection?

While the end-to-end approach to interpersonal communication advances slowly and inexorably, the idea that end-to-end encryption can be a valid solution for protecting permissions and write operations is still a strange and unfamiliar one. But it needn’t be — in fact, the notion that there is currently a known and proven albeit slightly esoteric way of making your perimeter more secure should be actively changed to — ‘there is a known, proven, and practical method to securing the trusted perimeter — so let’s use it’ whenever possible.

Although there are still certain mostly performance and bandwidth-related issues that remain to be solved, or — it would be more correct to say — optimised, this is precisely what the next logical step in the evolution of large systems can (and in our opinion — should) be, together with enforcement of cryptographic access control. Yes, it might get messy in the beginning, but that’s evolution at work, after all.

If you have a story to share — we’d love to hear from you! Please reach out to us via [email protected] or @cossacklabs.