Suppose Peggy needs to prove to Victor that she is in possession of a secret without revealing the secret. Can she do so in a way that convinces Victor that she really does know the secret? This is the question at the heart of one of the most powerful cryptographic processes we can employ in identity systems: zero-knowledge proofs (ZKPs).
Suppose for example that Peggy has a digital driver's license and wants to prove to Victor, the bartender, that she's over 21 without just handing over her driver's license or even showing him her birthdate. ZKPs allow Peggy to prove her driver's license says she's at least 21 in a way that convinces Victor without Peggy having to reveal anything else (i.e., there's zero excess knowledge).
This problem was first explored by MIT researchers Shafi Goldwasser, Silvio Micali, and Charles Rackoff in the 1980s as a way of combatting information leakage. The goal is to reduce the amount of extra information the verifier, Victor, can learn about the prover, Peggy.
One way to understand how ZKPs work is the story of the Cave of Alibaba, first published by cryptographers Quisquater, Guillou, and Berson1. The following diagram provides an illustration.
The Cave of Alibaba has two passages, labeled A and B, that split off a single passageway connected to the entrance. Peggy possesses a secret code that allows her to unlock a door connecting A and B. Victor wants to buy the code but won't pay until he's sure Peggy knows it. Peggy won't share it with Victor until he pays.
The algorithm for Peggy to prove she knows the code proceeds as follows:
If Peggy can always come back by whichever passageway Victor selects, then there is an increasing probability that Peggy really knows the code. After 20 tries, there's less than one chance in a million that Peggy is simply guessing which letter Victor will call. This constitutes a probabilistic proof that Peggy knows the secret.
This algorithm not only allows Peggy to convince Victor she knows the code, but it does it in a way that ensures Victor can't convince anyone else Peggy knows the code. Suppose Victor records the entire transaction. The only thing an observer sees is Victor calling out letters and Peggy emerging from the right tunnel. The observer can't be sure Victor and Peggy didn't agree on a sequence of letters in advance to fool observers.
Note that this property relies on the algorithm using a good pseudo-random number generator with a high-entropy seed so that Peggy and third-party observers can't predict Victor's choices.
Thus, while Peggy cannot deny to Victor that she knows the secret, she can deny that she knows the secret to other third parties. This ensures that anything she proves to Victor stays between them and Victor cannot leak it—at least in a cryptographic way that proves it came from Peggy. Peggy retains control of both her secret and the fact that she knows it.
When we say "zero knowledge" and talk about Victor learning nothing beyond the proposition in question, that's not perfectly true. In the cave of Alibaba, Peggy proves in zero-knowledge that she knows the secret. But there are many other things that Victor learns about Peggy that ZKPs can do nothing about. For example, Victor knows that Peggy can hear him, speaks his language, walks, and is cooperative. He also might learn things about the cave, like approximately how long it takes to unlock the door. Peggy learns similar things about Victor. So, the reality is that the proof is approximately zero knowledge not perfectly zero knowledge.
The example of Alibaba's Cave is a very specific use of ZKPs, what's called a zero-knowledge proof of knowledge. Peggy is proving she knows (or possesses something). More generally, Peggy might want to prove many facts to Victor. These could include propositional phrases or even values. ZKPs can do that as well.
To understand how we can prove propositions in zero knowledge, consider a different example, sometimes called the Socialist Millionaire Problem. Suppose Peggy and Victor want to know if they're being paid a fair wage. Specifically, they want to know whether they are paid the same amount but don't want to disclose their specific hourly rate to each other or even a trusted third party. In this instance, Peggy isn't proving she knows a secret, rather, she's proving an equality (or inequality) proposition.
For simplicity, assume that Peggy and Victor are being paid one of $10, $20, $30, or $40 per hour.
The algorithm works like this:
This is called an oblivious transfer and proves the proposition VictorSalary = PeggySalary
true or false in zero knowledge (i.e., without revealing any other information).
For this to work, Peggy and Victor must trust that the other will be forthcoming and state their real salary. Victor needs to trust that Peggy will throw away the three other keys. Peggy must trust that Victor will put only one slip with a "+" on it in the boxes.
Just like digital certificates need a PKI to establish confidence beyond what would be possible with self-issued certificates alone, ZKPs are more powerful in a system that allows Peggy and Victor to prove facts from things others say about them, not just what they say about themselves. For example, rather than Peggy and Victor self-asserting their salary, suppose they could rely on a signed document from the HR department in making their assertion so that both know that the other is stating their true salary. Verifiable Credentials provide a system for using ZKPs to prove many different facts alone or in concert, in ways that give confidence in the method and trust in the data.
In the previous examples, Peggy was able to prove things to Victor through a series of interactions. For ZKPs to be practical, interactions between the prover and the verifier should be minimal. Fortunately, a technique called SNARK allows for non-interactive zero-knowledge proofs.
SNARKs have the following properties (from whence they derive their name):
You'll typically see "zk" (for zero-knowledge) tacked on the front to indicate that during this process, the verifier learns nothing other than the facts being proved.
The mathematics underlying zkSNARKs involves homomorphic computation over high-degree polynomials. But we can understand how zkSNARKs work without knowing the underlying mathematics that ensures that they're sound. If you'd like more details of the mathematics, I recommend Christian Reitwiessner's "zkSNARKs in a Nutshell".
As a simple example, suppose Victor is given a sha256
hash, H
, of some value. Peggy wants to prove that she knows a value s
such that sha265(s) == H
without revealing s
to Victor. We can define a function C
that captures the relationship:
C(x, w) = ( sha256(w) == x )
So, C(H, s) == true
, while other values for w
will return false
.
Computing a zkSNARK requires three functions G
, P
, and V
. G
is the key generator that takes a secret parameter called lambda
and the function C
and generates two public keys, the proving key pk
and the verification key vk
. They need only be generated once for a given function C
. The parameter lambda
must be destroyed after this step since it is not needed again and anyone who has it can generate fake proofs.
The prover function P
takes as input the proving key pk
, a public input x
, and a private (secret) witness w
. The result of executing P(pk,x,w)
is a proof, prf
, that the prover knows a value for w
that satisfies C
.
The verifier function V
computes V(vk, x, prf)
which is true if the proof prf
is correct and false otherwise.
Returning to Peggy and Victor, Victor chooses a function C
representing what he wants Peggy to prove, creates a random number lambda
, and runs G
to generate the proving and verification keys:
(pk, vk) = G(C, lambda)
Peggy must not learn the value of lambda
. Victor shares C
, pk
, and vk
with Peggy.
Peggy wants to prove she knows the value s
that satisfies C
for x = H
. She runs the proving function P
using these values as inputs:
prf = P(pk, H, s)
Peggy presents the proof prf
to Victor who runs the verification function:
V(vk, H, prf)
If the result is true, then the Victor can be assured that Peggy knows the value s
.
The function C
does not need to be limited to a hash as we did in this example. Within limits of the underlying mathematics, C
can be quite complicated and involve any number of values that Victor would like Peggy to prove, all at one time.
This article is an excerpt from my book Learning Digital Identity, published by O’Reilly Media.
Also published here.