The hype is dead, long live the hype. After deep learning, a new entry is about ready to go on stage. The usual journalists are warming up their keyboards for blogs, news feeds, tweets, in one word, hype. This time it’s all about privacy and data confidentiality. The new words, . homomorphic encryption For the record, I am not personally against such a technology - quite the opposite I think it is very powerful and clever, rather against the misleading claims that usually make more followers than the technology itself. The purpose of this post is to shed light on homomorphic encryption, its benefits, and limitations, being as impartial as possible. What is Homomorphic Encryption? Homomorphic encryption (HE) is an encryption scheme that allows one to compute something like . While such operation does not shock anyone per se, as a matter of fact, operands , , and result are disclosed. Boom! Encrypted(2) + Encrypted(3) = Encrypted(5) 2 3 5 never The mathematics behind HE refers to the concept of in algebra, a structure-preserving map between two algebraic structures of the same type. Basically, both encryption and decryption functions can be thought of as homomorphisms between plaintext and ciphertext spaces. This definition explains why the sum of two operands in the plaintext space is preserved in the ciphertext space too. homomorphism All computations within an HE setting are represented as either or circuits, depending on the type of computation to represent and the types of gates allowed in each one. Feel free to expand on this at the links provided above. Boolean arithmetic The theory behind HE is not new (as the theory behind deep learning), being the first HE scheme proposed in the late 70s. Since then, many flavors of homomorphic encryption have been proposed, populating the literature with three major schemes recognized by the research community: , supporting the evaluation of circuits of only one type of gate (e.g. only additions or only multiplications) partially HE , capable of evaluating two types of gates somewhat HE , capable of evaluating arbitrary circuits (becoming the most valuable and powerful of the three) fully HE From a practical perspective, schemes are limited to evaluating low-degree polynomials over encrypted data. Such a scheme is limited because each ciphertext is noisy, and such noise grows as one keeps performing additions and multiplications in the ciphertext space. somewhat homomorphic encryption At some point, the amount of noise that gets  accumulated becomes so high that the resulting ciphertext becomes indecipherable. The presence of noise is essential to guarantee a certain degree of security and protecting the secret operands and results of intermediate computations. As in classic encryption, the value to be protected is spread in a large noisy space. The role of the secret key is to decipher and reconstruct securely the encrypted components of a computation. The aforementioned space is usually a very very large polynomial with a degree of 10K or more. In such a context, each coefficient is represented by more than 600 bits. This explains how large a dataset would become after encrypting it with HE schemes to guarantee industrial security. Obviously, one could trade security for a lower degree polynomial and a smaller cyphertext. For many realistic use cases, the only flavor of HE that is considered is . Performing additions and multiplications in an encrypted state are two essential operations to build all other non-primitive operations that are necessary to represent any type of computation. fully HE The ultimate goal of such a scheme would be to be capable of performing arbitrary circuits and essentially represent a Turing complete framework. Why should you care about FHE? Because under an FHE scheme no operand and no result is ever disclosed, highly regulated environments are the first to benefit of such a powerful property. Banks, social media platforms and pharmaceutical companies running their business in Amazon Web Services are in fact giving away their secrets. Because . As a matter of fact, doing computation on a system that is being managed by third parties, means that the has access to your data. “There’s no cloud. It’s just someone else’s computer” BOFH While technologies like AMD’s Secure Encrypted Virtualization (SEV) implemented at the level of the CPU, encrypt the entire CPU state (basically all the registers) with keys that are not accessible to the other guests, many other use cases require more sophisticated approaches to the problem of data confidentiality. For instance, the requirements for private machine learning go beyond protecting the CPU state and register set. Homomorphic encryption in the real world Some practical - but nonexhaustive - use cases that would be possible with FHE schemes are listed below: (e.g. sending both data and computation to Amazon EC2 instances and being sure nobody can access them) (already possible with SEV-ES and orders of magnitude more efficient) secure cloud outsourcing making the without disclosing the results (e.g. find common records in two private datasets, without revealing any of the datasets, nor the common records if any) intersection of private sets training models on private data without revealing the models machine learning without revealing the query (nor the results) searching Performance and accuracy Security and confidentiality usually come at a very high cost in terms of performance. Highly regulated environments that are prone to trade performance for maximum security might be interested in adopting technology like FHE. But what are these costs? Benchmarks (the least scientific version of them) measured from the most widely used family of models in data science, linear regression, with a bit more than a dozen variables are provided below. For the data scientist who is used to train a model in about 1 hour, the same model under the FHE setting would take about 2 days. The same model requiring 2 GB of memory, would need about 30GB. Still doable, if that model is logistic regression. Let’s see how long training a neural network would take. A relatively small neural network that converges in approximately 12 hours, would take about 15 days. And the memory requirements would grow from 10GB to 150GB. However, despite the large divergence in terms of computational cost and memory usage, the accuracy degradation from computation in plaintext to one in cyphertext is negligible. The slight drop in accuracy is usually due to float-to-integer conversions. Each operation performed on a floating-point value decreases its accuracy a little for additions and a bit more for multiplications. Sometimes those overheads are acceptable in some domains e.g. analyzing financial data that are siloed among retail, loan, and investment banking. In many other domains (especially those in which a near real-time prediction is necessary), FHE could not even remotely be considered. Conclusion Homomorphic encryption is a very interesting field of research that - like deep learning - has been overlooked in the past years. Different market conditions and innovative findings in tech are making some of the FHE assumptions possible today. I am personally observing many analogies with what happened with back-propagation and neural networks in the 90s. An analogy I would definitely not like to see is the hype around a technology that not only is not ready and but is also limited by the theory of mathematics. If you like my posts, feel free to visit my personal page at https://codingossip.github.io References Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector IBM Fully Homomorphic Encryption Toolkit for Linux Previously published at https://codingossip.github.io/2020/machine-learning-with-homomorphic-encryption/

Amazon

NO! GPT-3 Will Not Steal Your Programming Job

Listen to my podcast

Read My Stories

Too Long; Didn't Read

Download free Tech Leaders Productivity Report!

Why You Should Care About Homomorphic Encryption

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Gradient descent vs coordinate descent

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

Gradient descent vs coordinate descent

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps