PVC Security: A Diplomacy-free Solution to Safe Data Sharing Between Rational Hostiles

This article is part of the Academic Alibaba series and is taken from the paper entitled “Covert Security with Public Verifiability: Faster, Leaner, and Simpler” by Cheng Hong, Jonathan Katz, Vladimir Kolesnikov, Wen-jie Lu and Xiao Wang, accepted by Eurocrypt 2019. The full paper can be read here.

How can mutually mistrustful parties collaborate to reap the full value of their collective data?

This is a pertinent question in the Big Data era. Powerful computing capabilities offer the potential to extract all sorts of valuable insights from data, valuable enough that some are even claiming data is the new oil. But just like oil, data cannot generate value unless it flows.

Most data does not flow, because most firms have a highly cautious attitude toward data sharing. They are right to do so, due to data confidentiality and personal privacy concerns, but this means that mutually mistrustful parties are denied insights that could be produced by analyzing their data jointly.

Since nobody is expecting competing enterprises to resolve their differences any time soon, the problem boils down to a technical one: Secure Multiparty Computation (MPC) protocols.

Addressing this issue, an international research team with members from Alibaba and several US and Japanese universities has proposed a new MPC protocol that offers something called publicly verifiable covert (PVC) security. PVC security requires zero diplomacy between hostile parties; in fact, it exploits the same self-interest that lies behind dishonest behavior when the opportunity for industrial espionage arises during data sharing. It does this by automatically generating a publicly verifiable certificate of misbehavior if any party so much as attempts to break the rules, permanently exposing that party’s untrustworthiness to the world.

The research team claim that while other approaches to PVC have failed to catch on due to their high overheads, bulky certificates and general complexity, their faster, leaner, simpler approach is the first truly viable solution. Meanwhile, the world-renowned International Association for Cryptologic Research have recognized the significance of the research by accepting their paper for their flagship Eurocrypt conference in 2019.

But why is a certificate of poor trustworthiness necessary in the first place? The simple answer is that while all security protocols attempt to make it impossible to break the rules to begin with, they are limited in how effectively they can do this. Specifically, protocols in the category of “semi-honest security” provide efficient security as long as all parties stick to the rules of the protocol, while those in the “malicious security” category protect against the most devious of them — but are extremely difficult to implement.

PVC security, the team claims, is the best of both worlds. It is a form of semi-honest security that protects against the majority of attacks and implements the certificate mechanism to disincentivize or penalize the most malicious attackers — so long as they are only malicious, and not also nihilistic.

To understand why the team arrived at this technical compromise, it is necessary to go back to where both semi-honest and malicious security models began: the basics of MPC.

Secure Multi-party Computation

In 1986, Chinese Academician Andrew Chi-Chih Yao proposed a method of secure multi-party computation (MPC) for performing calculations on behalf of multiple mutually distrusting parties. MPC allows each party to provide their input and receive the final output without seeing the input of the other parties. In principle, this provides a way of deriving value from data without having to disclose that data to a third party.

With MPC, hostile parties can share insights without sharing their data

There are two ways of implementing MPC: garbled circuit (GC) and secret sharing. The solution adopted by the research team is based on GC.

Garbled circuit

Any computing function is ultimately represented by circuits (adders, multipliers, shifters, selectors, etc.) within the computer language, and these circuits can be composed of only AND and XOR logic gates.

A gate is actually a truth table. For example, the truth table of an AND gate is:

In this example, the bottom-right cell indicates that if both input wires have a value of 1, the value of the output wire=1 (that is, 1 AND 1 = 1).

Now, assume that each wire is encrypted with a different key. The truth table now looks like this:

In this example, the bottom-right cell indicates that if the gate input is b and d, then the encrypted f is the output (the keys are b and d). This gate remains the same from the perspective of the control flow; the only difference is that the input and output are encrypted, and the output can be decrypted only by using the corresponding input. The decrypted f can also be used as the input to the subsequent gate.

It is this encryption method that is called the garbled circuit. By using this method to encrypt all the gates in a circuit in order, a function represented as a GC is achieved. This function receives an encrypted input and outputs an encrypted result.

Assume that two parties A and B provide data a and b, respectively, and wish to compute an agreed function F(a, b) securely. A GC-based secure two-party computational protocol process can be described as follows:

1. A encrypts F to obtain the GC-represented function “GCF” (note that A is the generator of the circuit, so it knows the key of each wire);

2. A encrypts its input “a” using the corresponding wire keys in step 1 to obtain Encrypt(a);

3. A sends Encrypt(a) and GCF to B;

4. A encrypts B’s input “b” using the corresponding wire keys in step 1 to obtain Encrypt(b), and sends Encrypt(b) to B;

5. B has a complete GC and all inputs, so it can run the circuit to obtain an encrypted output;

6. A sends the keys of the output wires to B. B decrypts to obtain the final result F(a, b);

7. If A requires it, B then sends F(a, b) to A.

A scrupulous reader might query why, in step 4, A encounters B’s input “b”. Is this not a violation of secure multi-party computation?

The answer is yes — unless you implement something called “oblivious transfer”.

Oblivious transfer

Oblivious transfer (OT) can be thought of as a means of solving the seemingly impossible conundrum where two parties wish to engage in a transaction, but are unwilling to share the sensitive information that is necessary to facilitate that transaction.

Suppose that a travel agency has travel information for n attractions and holidaymaker Joe Bloggs wants to visit attraction A. Joe wants to purchase some relevant materials from the travel agency to prepare for his trip. But he is concerned about his privacy, and does not want to disclose to the travel agency where he is going.

In other words, both parties hope that the transaction can be carried out under the following privacy conditions:

1. Joe does not disclose to the travel agency the message “I am going to attraction A”.

2. The travel agency only provides the information for attraction A that Joe pays for, without revealing the other available information.

These conditions seem impossible to satisfy. If the travel agency exclusively provides information on attraction A to Joe, it will implicitly understand that Joe is likely to visit that attraction. The only way to protect Joe’s privacy, short of him purchasing the information on all available attractions, would be for the travel agency to provide all the information to him, saving him the need to specify an attraction. But this of course runs contrary to the agency’s interests.

OT provides a way of allowing this transaction to proceed under the stated conditions. Under the OT protocol, the travel agency encrypts the n pieces of data it owns using an encryption algorithm with parameters agreed upon by both parties, and then sends them to Joe. Joe can decrypt the data relating to attraction A from the ciphertext, but cannot decrypt the remaining (n-1) pieces of data.

Taking n=2 as an example, a description of a “1 of 2” OT implementation method is given below, based on the Diffie-Hellman key exchange protocol. In this example, S (sender) is the travel agency and R (receiver) is Joe Bloggs. S owns two pieces of data (_M_0 and _M_1), and R hopes to obtain _M_0.

1. S secretly generates a random number a; R secretly generates a random number b;

2. S sends A=ga to R; R sends B=gb to S;

3. S computes _k_0=Hash(Ba), _k_1=Hash((B/A)a);

4. S encrypts _M_0 with _k_0 and _M_1 with _k_1: _e_0=_Encryptk_0 (_M_0), _e_1=_Encryptk_1 (_M_1), and then sends _e_0 and _e_1 to R;

5. Given that _B_a = Ab, R can obtain _k_0 and decrypt _M_0, but R cannot compute _k_1, hence it cannot decrypt _M_1.

If R wishes to obtain _M_1, he just needs to revise B=_g_b to B=_Ag_b in step 2.

Oblivious transfer: everyone is happy, while none are the wiser

Garbled circuit with oblivious transfer

To recap the following problematic step in the above GC computation:

A encrypts B’s input “b” using the corresponding wire key in step 1 to obtain Encrypt(b), and sends Encrypt(b) to B

This can be securely solved with OT, where A plays Sender, and B plays Receiver. B obtains Encrypt(b) from A, but does not disclose the contents of b to A. This is summarized in the following figure:

From Garbled Circuit to PVC

Only the basics of the most crude form of GC have been described above. In practice, many optimizations exist for GC, including Point & Permute, Free XOR, and Half Gates, and in recent years GC-based solutions have almost reached the point where they can be efficiently implemented. For example, the Hamming Distance of two million-dimensional vectors can be calculated using GC-based secure MPC in 1.5 seconds, with no content shared between the two interested parties.

Nevertheless, there are genuine security issues with GC, some of which can even be observed from the examples described above.

Failings of “semi-honest” GC

Keen observers may have noticed the following issue with the example given earlier: How do we ensure that the GCF that A gives to B is a correct GC?

For example, assume that A and B want to compare the size of a and b, and agree that F(a, b)=a>b?1:0. A malicious A could simply make another function F’ (for example F’(a, b) to output the left-most bits of b), which violates B’s privacy. And since the function is sent to B using a GC, B has no way to detect the violation.

This problem is one of malicious behavior on the part of A. GC relies on all participants conforming to a semi-honest behavioral model, whereby they will faithfully execute the procedure according to the protocol steps — at most speculating on the private data of other parties based on what they can infer, but never obtaining it. To use the analogy of a poker game, a semi-honest player might try to infer someone else’s hand from their own hand, but would still follow the rules of the game.

A malicious poker player, on the other hand, would use all means at their disposal — e.g. illegally switching their cards or even stealing others’ cards — to try and win the game.

Previous solutions that optimize GC to produce a malicious security model have ruined its efficiency. The fastest GC-based malicious security solution takes over 10 seconds to calculate the Hamming distance between two million-dimensional vectors, which, at seven times slower than the equivalent semi-honest solution, makes it practically impossible to implement when working with large-scale data.

Malicious players do exist, however. For this reason, PVC security borrows notionally from the malicious behavior model to produce a semi-honest solution that incentivizes participants to play by the rules.

The case for the PVC compromise

The key innovation in PVC is that for each action by each participant, a publicly verifiable signature-like mechanism is generated.

This is predicated on the idea that if a participant behaves maliciously, there is a probability of λ that the other participants will find out. λ is called the “deterrent factor”, the reason being that there is a good chance the other participants will publicize the malicious behavior and its signature if they discover it, causing the perpetrator to suffer a reputational loss.

Considering the importance of reputation to data owners — they may never be able to collaborate with others again if their malpractice is publicized — a 50% deterrent is considered sufficient to dissuade rational parties from wrongdoing.

PVC Perfected

The PVC model was originally proposed by scholars at Asiacrypt 2012. Some scholars proposed improved versions at Asiacrypt 2015, but as well as being inefficient they only supplied complex theoretical descriptions and were a long way from practical implementation.

The research team’s new PVC solution offers a concise protocol with complete code. Crucially, it does this without a major performance cut, boasting a time of just 2.5 seconds to calculate the Hamming distance of two million-dimensional vectors.

Assume once again that two participants A and B provide data a and b, respectively, and wish to compute an agreed function F(a, b) securely. Assume also that the deterrence factor λ = 50%. The team’s PVC scheme would work as follows:

1. A selects two random seeds _s_1 and _s_2. B and A run OT to randomly select one of them. (For argument’s sake, assume that B obtains _s_1);

2. A generates _GC_1 and _GC_2 using _s_1 and _s_2, respectively;

3. B and A run OT to obtain the encrypted values of the input wires in _GC_1 by B (As described in step 8, _GC_1 will not be used. Thus, the encrypted wires here do not necessarily correspond to b. For example, they may be fixed to constant values);

4. B and A run OT to obtain the encrypted value of b corresponding to the input wire in _GC_2 by B;

5. A hashes _GC_1 and sends the hash to B;

6. A hashes _GC_2 and sends the hash to B;

7. A signs all the above process transcripts and sends the signatures to B;

8. Since B has _s_1, it can generate _GC_1 by itself, and can simulate steps 3 and 5 by itself. If the result is inconsistent with that sent by A, the relevant signature is published as evidence of the malicious action of A. If they are consistent, _GC_2 is used for true computation.

If A acts maliciously, there is always a 50% probability of being caught out by B, because A does not know which GC seed is in B’s hand. Therefore, a rational A would choose not to act maliciously, and would instead implement the secure multi-party computing protocol faithfully.

This article only deals with the relevant technical concepts on a very basic level. Interested readers can learn more by reading the full paper here.

Alibaba Tech

First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.