paint-brush
Honest Online Voting: Myth or Reality?by@web3judge
273 reads

Honest Online Voting: Myth or Reality?

by JudgeSeptember 27th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

I seek a solution that can ensure authenticity, security, and transparency throughout the entire online voting process. I discuss how to create sych one.
featured image - Honest Online Voting: Myth or Reality?
Judge HackerNoon profile picture


The idea of conducting online voting is not new. Various services are emerging, but traditional in-person voting with paper ballots remains the most common method for making important collective decisions. For example, in national elections with millions of ballots, thousands of polling stations, observers, and volunteers. Or at shareholder meetings of large corporations, where even at the notification stage of an upcoming meeting, a large number of registered letters must be sent out, with confirmation that they have been received. Or at general homeowners meetings, where you have to catch a moment when each resident is neither at work, on vacation, nor at their country house, to walk around all the apartments with a survey sheet. Yes, "paper" is slow and expensive. But despite all the drawbacks of "paper" voting, it continues to be actively used—partially due to the inertia of regulatory legislation and, probably to a larger extent, because of faith in the reliability of traditional, well-established procedures.


Nevertheless, remote solutions are emerging and claiming their place in the sun. Online voting is gaining popularity in both the public and corporate sectors. The advantages of "online" are obvious in terms of efficiency—it's much easier and cheaper to send out ballots electronically to people in different parts of the world than to gather them in a rented conference hall. The pandemic added nuances in the form of the need for "social distancing", further highlighting the benefits of remote decision-making methods.


The vast majority of online voting services are centralized solutions managed by a specific operator company. Are there threats to confidentiality and potential manipulation? From a technical standpoint, it’s clear that the operator of such a system has access to all its data. Accordingly, theoretically, they could interfere with the voting process or alter the results. But many business solutions rely on trust—and there's nothing wrong with that.


These systems are built not on "technical impossibility", but on the "economic unfeasibility" of interfering in the business processes of clients. Can a bank alter your account balance? Yes, it can. Will it do so? Most likely not (but that’s not certain). A legal infrastructure is built around such services in the form of contracts between the client and the service operator, non-disclosure agreements, government regulations, etc. As a result, it becomes a workable system: businesses do their job, and the online voting service operator helps the business make decisions. But it seems a certain unease remains.


But what if you need more? What if you simultaneously require economic efficiency, confidentiality of decisions, and the maximum "theoretical impossibility" for the operator of the online voting service to influence the outcome? Imagine a hypothetical situation where one global corporation, "Y" makes strategic development decisions using the infrastructure of another global corporation, "N". No matter how many agreements and contracts are signed—will "N" resist the temptation to peek at what its direct or indirect competitor is discussing? Will the management of "Y" sleep peacefully, confident that no one will interfere with their breakthrough development plans?


Should we, trusting in the perfection of our world, answer "yes" to both questions? Or can we create a technical solution that meets all the requirements? Let’s explore this further.

What must online voting look like?


At the most abstract level, all that information systems do is create/modify data, transmit data, store data, and provide access to that data. In the context of online voting, this looks as follows:


  • The voting organizer creates the agenda and wants to manage the access rules for this agenda: whether to make it public or disclose it only to the voting participants.
  • The voting organizer creates a list of participants and wants to be sure that only those listed will take part in the voting. The participants themselves are also interested in this, as they are expected to make decisions on issues important to them, and the opinions/intentions/interests of third parties should not influence that decision.
  • The organizer sets the voting rules. These rules—without revealing the content of the vote—should be accessible to the general public so the organizer, the participants, and third parties (e.g., regulators) can ensure that the decision-making method was chosen correctly and everything proceeded according to a pre-established plan.
  • A voting participant receives the voting agenda and wants to ensure that it is the exact list of questions that everyone else received, not a "shortened version" sent only to them. These concerns may seem absurd in paper-based voting, where all ballots are pre-printed, issued from a common stack, and there’s no possibility of "personalizing" a ballot. However, in the case of online voting, all we have are bytes of information that can be easily modified in real-time. Therefore, more guarantees of authenticity would be desirable.
  • A participant fills out a ballot and wants to ensure that their vote will be counted, their vote won’t be altered, and no one will be able to determine exactly how they voted (I'll leave aside specific cases of open voting).
  • The organizer and participants receive the voting results. At this point, everyone wants to be sure that no one interfered with the voting process, no ballots were filtered, no fraudulent ballots were added, no results were tampered with, etc. This includes even the operator of the online voting service.


In short, I seek a solution that can ensure authenticity, security, and transparency throughout the entire online voting process.

How Does Blockchain Fit In?

At a quick glance, the only problem with online voting seems to be the absolute power of the voting service operator. The good news is that with the emergence of blockchain, we have learned to limit this power by distributing the responsibility for system data among its participants. However, it's clear that blockchain is not a silver bullet. Essentially, it is a relatively slow and finicky database—a tool with certain positive properties and some drawbacks. To fully solve the problem, it’s necessary to properly define the role of blockchain in the system and understand what else we might be missing.

Distributed Storage

The immediate solution is to use blockchain to store voting data. With the proper deployment of a blockchain node network, we won’t need to worry about the service operator single-handedly altering the recorded data. The problem of securely storing historical information is solved.


Additionally, we can solve the issue of data authenticity: all information that enters the blockchain (in the form of transactions) must be signed by the sender. The participant’s public key becomes their internal system identifier, and the signature on the submitted transaction proves that they indeed sent it.


Naturally, the question of storing confidential data arises. Blockchain itself doesn’t allow such tricks, so we need to figure out which data in the system is public and which is private, as well as find ways to protect confidential information. For example, when creating a vote, we have the following set of data:


  • The voting agenda and materials;
  • Users' contact details—their real-world identifier (email or phone number);
  • User's personal data—which, generally speaking, is not required in the system but makes work easier for both the voting organizer and the participants;
  • A pair of public and private keys for the voting participant—these serve as the participant’s internal system identifier and allow them to participate in the vote.

Voting Agenda and Materials

We can handle the voting agenda and materials like any other cloud service: the voting organizer uploads them to a server (backend), and their availability is determined by a role-based model. The agenda is shown to voting participants while hidden from others. To ensure that nothing happens to the agenda on the backend, we save a hash of the materials stored on the server in the blockchain. The organizer can see that the content of the vote is exactly what was planned, and the voting participants can verify that everyone has received the same agenda.

Personal and Contact Information

Personal and contact information is stored on the server as part of the user's account. Publishing this data on the blockchain would be wrong. On the blockchain, the list of voting participants must be saved as a list of public keys.


The key pair is created locally by the user on their personal device. The private key never leaves the device, while the public key is saved on the backend as part of the account. The voting organizer works with the participant list in the form of names and emails. When saving the voting data in the blockchain, the list of public keys is also sent. The vote is signed by the user’s key, and if the public key of the sender is on the participant list, the ballot is accepted. This scheme allows us, on the one hand, to keep users' personal data private and, on the other hand, to make the system more transparent.

Handling Submitted Ballots

However, simple hashing of completed ballots won’t be enough. Yes, hashing the ballot in the blockchain will protect it from tampering—but we also need to count the votes. To avoid making the system look like a circus act where ballots and results appear magically out of a hat, we’ll try to make everything transparent and save not just the fact of vote submission but also its contents in the blockchain. Of course, this should be done in such a way that everything remains confidential.


After receiving the ballot from the server and verifying that the agenda matches the hash stored in the blockchain, the voting participant fills out the form, then encrypts it using the publicly known voting public key, and finally signs it with their private key (in the form of a blockchain transaction). As a result of these steps, we obtain a filled-out ballot that cannot be altered (it is signed) or read (it is encrypted).


The only thing that could happen to such a ballot is that it could be "lost." However, there is no reason to "lose" this specific ballot since we don’t know what the user voted for, and the disappearance of the ballot can easily be detected by both the voter and external observers. This is because both the list of registered voters and the ballots they submitted are visible in the blockchain. Moreover, the distributed nature of the blockchain allows the voting participant to send the transaction directly to any of the blockchain nodes, bypassing the backend or any other "intermediaries." The blockchain node is completely indifferent to the voting process or any other procedures being built on top of the blockchain network. All the network cares about is the correct format of the transaction: if everything is valid, the transaction, and therefore the vote, will be registered in the blockchain.

Intermediate Summary

I can say that:


  • The problem of transparency and immutable storage of historical data can be solved by using blockchain.
  • The visibility of confidential data is managed in a traditional centralized manner, but we can introduce an additional information control mechanism on the backend through hash "anchors" stored in the blockchain.
  • It is possible to separate the handling of information about the voters. Personal data will be stored in the traditional way, while interaction occurs through a unique blockchain identifier, which the user will use to send their vote.


However, even though I promised to do away with magic, it has still happened. To encrypt the ballots, we need a public voting key, but no one has said where it came from! Obviously, this is a crucial part of the entire voting process, and it must not be treated lightly. An even more interesting part of the puzzle is the private key corresponding to the voting public key, as it will be used to reveal the voting results. It's time to delve into cryptography (which, for 99.9% of people, doesn't differ much from magic).

Cryptography


So, we have achieved a situation where the data stored in the system cannot be altered. However, before the data enters the blockchain, there remains a wide scope for attackers to influence the outcome of the vote. For example, in the previous section, I mentioned the possibility of "losing" a vote. There is no motivation to do this if I/you cannot read the encrypted ballot. But what if we can?


Suppose we decide to generate the voting key on the backend, send the public part to all participants, and after the voting concludes, use the private key to decrypt the ballots. Even at first glance, this implementation seems unreliable. What would prevent the system operator (or an attacker who gains access to the server) from obtaining the private voting key, decrypting the ballots before they reach the blockchain, and filtering out the "incorrectly filled-out" ones? Or even inserting their own public voting key on the server before the voting begins? In this case, only the attacker would have the corresponding private key, and no one else would be able to access the voting results. What if, during the voting process, the attacker gains the ability to decrypt the ballots stored in the blockchain, accesses the intermediate results, and somehow influences the final outcome?


A somewhat more reliable option is the technique of splitting the private key after generation — a well-known scheme called Shamir's Secret Sharing. The key pair is generated, the public key is stored in the blockchain as the public voting key, and the private key is split into several parts, each independently held by trusted participants. To finalize the voting results, the private key needs to be reassembled, after which the ballots can be decrypted. If one of the trusted participants is "unavailable.” Shamir's scheme allows for the private key to be reassembled using fewer parts. That is, if the key was split into N parts, it can be reassembled using K parts, where K < N.


This option seems much more secure and advanced, and it really is. But there are nuances. First, in the time between key generation and its splitting, it is an obvious target for attackers and a single point of failure in the system. Second, once the key is reassembled, we can decrypt each individual ballot. This won't allow us to filter them out, as they are already in the blockchain and cannot be removed. However, it does compromise the confidentiality of the vote. The backend contains the link between the participant's name and their public key, and the blockchain holds the link between the public key and now decrypted ballot. I/you know how each person voted — allowing us, for example, to deny bonuses to undesired employees.


Of course, there are mechanisms to break the first link — between personal data and the public key — using a technique called blind signatures. However, this is a rather specific mechanism that needs to be implemented correctly. Even then, there might still be the possibility of "tracking by IP". The user may come to an authorized method to receive a blind signature and later knock on an unauthorized method to submit the vote. Formally, in the second case, we don’t know who is coming to us and rely solely on verifying the blind signature. But we still have the ability to match device/browser/connection parameters and determine that it is indeed the same Alex who received the blind signature from us five minutes earlier. Or consider a similar attack by correlating the time when the signature was received, and the vote was submitted. When votes come in a rush at 500 people per second, such an attack loses effectiveness, but at lower volumes, it could work quite well.


Can we do better?

Distributed Key Generation

Let’s give it a try. We can achieve this with the help of a secure multi-party computation (SMPC) protocol, which allows several participants to perform a calculation based on each of their secret inputs in such a way that no participant can obtain any information about the other participants' secret inputs. This protocol ensures encryption in which multiple parties participate in the calculation of a common public voting key and the corresponding set of private keys. These private keys are initially generated independently, and participants in the SMPC protocol do not exchange them, eliminating the single point of failure. Furthermore, we can implement a threshold scheme, such as K out of N, similar to Shamir's scheme.


The DKG (Distributed Key Generation) algorithm from Torben Pryds Pedersen's paper, "A Threshold Cryptosystem Without a Trusted Party", is used to generate the common public voting key (MainPublicKey). The algorithm is adapted to elliptic curves (the original paper uses the multiplicative group of a finite field (Galois field) GF(p)). One limitation is that if any participant complains (for example, if the checksum doesn't match) about another, the entire key generation process must be restarted from the beginning.


We must use standard elliptic curves such as secp256k1 (Bitcoin, Ethereum) and the SHA-256 hash function. It’s easy to add other options, such as Ed25519, if necessary. We can also switch from 256-bit curves to 512-bit curves.


The participants in DKG will be several instances of cryptographic services performing all the cryptographic "magic". To avoid any concerns about "opaque" communication between services, the only interface these services will have is interaction with the blockchain node. One decryption corresponds to one node. Although a 1-to-1 ratio isn't strictly required, there could be more or fewer decryptions than nodes, but it seems logical to have the same level of decentralization on both the data storage/transport layer (as discussed in the previous section) and the cryptographic layer.


The protocol for creating a public key will be launched for each new vote, which will help reduce the damage to the system’s data in the event of key compromise.


Homomorphic Encryption

Now we have a solid set of tools: we've protected historical data (blockchain), learned how to create encryption and decryption keys without a single point of failure (DKG), and can use these keys in a transparent way, free from a single weak link. Both tools are based on the overarching ideology of decentralization, where we don't have to trust a single system operator but can instead rely on multiple independent participants.


However, we still need to address what exactly we will encrypt and how. There's also the issue of maintaining voting confidentiality, as mentioned earlier when decrypting individual ballots.


This is where homomorphic encryption comes in. This technique allows for arithmetic operations to be performed on encrypted data without the need for decryption. In a system that is homomorphic for addition, for example, we can add two encrypted numbers, get their encrypted sum, and only decrypt the result of the addition.



We can leverage this feature in voting by representing a ballot as a matrix, where each row corresponds to a separate question for which a decision is being made, and the cells within the row represent the possible answers to that question. When filling out the ballot, the voter selects an option, which we mark with a "1". The remaining cells are filled with zeros. Each cell of the ballot is then encrypted. It is in this encrypted form that the ballot is recorded on the blockchain. What is encrypted in each cell remains unknown to us.

Bulletin in the form of a matrix of questions and answer choices


When the voting ends, we don't decrypt the individual ballots; instead, we sum the encrypted values for each possible answer. Using the property of homomorphic encryption, we can calculate the total number of votes for each option without ever knowing how individual participants voted. Only the final tallied results are decrypted. This ensures that while the system can still calculate the election results, it never compromises the privacy of the individual voters.


Thanks to homomorphic encryption, we can perform a "column-wise addition" for each question across all the received ballots. Afterward, we only decrypt the final voting results without decrypting individual ballots.


Encrypted scoring of results


As a result, we achieve not only collaborative and transparent work among several independent services in tallying the voting results—none of which can falsify the outcome—but also a very high level of voting confidentiality. We've anonymized voter choices through "column-wise addition"—we know the total sum but not the individual components!


To achieve this, we can use the ElGamal cryptosystem. Originally, this cryptosystem was based on the discrete logarithm problem in a finite field and is homomorphic to multiplication, but it can be modified to work on elliptic curves and adapted for homomorphism by addition.


This way, we have done everything possible to protect the voter. Their vote cannot be forged, read, or practically blocked or "lost". The voter cannot be punished for selecting the "wrong" candidate.


Moreover, I can offer a solution to the common problem in voting systems where voters cast their ballots not from the protected and isolated space of a voting booth but from arbitrary locations and unpredictable environments—the problem of coercion. This occurs when voters are pressured to vote for a specific candidate. Since the vote cast under pressure is technically valid, the system will accept it as coming from the voter's public key. However, nothing prevents the voter from submitting their vote multiple times. Each submitted ballot will be recorded in the system, but when tallying the results, the system will count only the last ballot sent from the voter's public key.

Zero-Knowledge Proofs (ZKP)

It’s not just the organizer or an external attacker who might have the motivation to influence the election results. A malicious actor could also be among the voting participants. Since we use a simple "column summation" mechanism and do not look into individual ballots, a voter could tamper with the client application and create a ballot where, instead of marking "1" for their candidate, they place "100500". Or put “–100500” for a candidate they dislike. The system would tally the results, and the malicious candidate would win by an overwhelming margin.

To prevent this, I propose to employ another crypto-magic technique—Zero-Knowledge Proofs (ZKP). This method allows one to prove knowledge of some information without revealing it or to prove that a value was encrypted correctly (for example, that it is within acceptable limits) without disclosing the actual value.


One of the clearest demonstrations of how ZKP (in its interactive form) works is the “Ali Baba’s Cave” example.



Participant "A" has a key that unlocks a door in a labyrinth and wants to prove it to Participant "B" without showing the key. To convince "B" of "A’s" claim, they organize a series of tests:


  • "A" enters the labyrinth while "B" turns away. "B" doesn’t know which direction "A" took.

  • "B" instructs "A" to exit from a specific side, for example, the left.

  • If "A" truly has the key, they can exit from either side and follow "B’s" instructions.


The chance that "A" simply got lucky and initially chose the left path is 50/50. Therefore, they repeat the test over and over until the probability of "A" simply guessing becomes negligibly small, and "B" accepts that "A" really does possess the key. During this process, "B" never sees the key itself or gains any of "A’s" information (such as which direction "A" chooses in each test), but after a series of tests, "B" receives a reliable (probabilistic, but with any necessary degree of certainty) proof.


Non-interactive Zero-Knowledge Proofs simplify the interaction between parties, and that’s exactly what I propose to use. To protect against dishonest voters, we can apply the following proof scheme.

ZKP on Ballots

For each question, in addition to the filled and encrypted cells with the voting options, we will attach a NIZKP (Non-Interactive Zero-Knowledge Proof) to each cell. This proof will demonstrate that the encrypted value in the cell is either "0" or "1" without revealing which one it is. Additionally, we will provide proof that the sum of the encrypted values for all cells in a question equals "1". This ensures that the voter can mark "1" in any cell but only select one option.


If we wish (and we do), we can extend this ZKP scheme to implement more complex voting systems. For example, in weighted voting, where each participant casts not just one vote but a number of votes proportional to their shareholding in a company, we would create a ZKP for the weight of the voter’s vote instead of simply "1". Or in multiple-choice voting, where a voter can select more than one option out of N, we add ZKP for a range of values [0, 1, 2, 3] for each cell. The total ZKP could allow a sum of [3], meaning the voter must distribute all their votes. Or we could allow a range [1, 2, 3], meaning they can select between 1 and 3 options, but cannot leave the question unanswered.


Ultimately, each ballot will be a sequence of ciphertexts and their corresponding proofs, where each "ciphertext and proof" pair corresponds to a specific answer option. At the end of this sequence, we append the proof for the sum of all ciphertexts for that question.


The system only accepts ballots where all ZKPs have passed the validity check. This means that a dishonest voter can spoil their ballot (as with a paper ballot), but they cannot exploit the fact that we don’t look into the contents of their vote — we have the means to verify that the ballot is correctly filled out without decrypting it.

ZKP on Partial Decryptions

Now, we have a list of ballots signed and submitted by valid voters. These ballots are correctly filled out and encrypted, and we don’t know who voted for whom. It’s time to obtain the voting results. Using the property of homomorphism, we have obtained encrypted results — the "aggregate ballot". This "aggregate ballot" is independently calculated by each of our cryptographic services, and they must perform the distributed decryption process.


The first obvious condition is that each decryptor must obtain the same final ballot as all others or the magic of distributed decryption will not work. However, this is not a major issue since we have a shared public list of ballots on the blockchain, which serves as the data source for all cryptographic services.


The second condition: if decryption doesn’t add up, and we suspect that some cryptographic services are attempting to "sabotage" the election, it would be useful to identify which service is malfunctioning. To address this, during the publication of partial decryptions, each cryptographic service generates and attaches a "ZK-decryption proof" using the Chaum-Pedersen ZKP algorithm. This proves knowledge of the value x in two relations: A = x * B and C = x * D (where A, B, C, and D are points on the same elliptic curve).


Now, any qualified third-party observer can:


  • Independently perform homomorphic summation of valid ballots and obtain the final aggregate ballot;

  • Verify the decryption proofs of the aggregate ballot from each cryptographic service;

  • Independently perform the final decryption of the voting results using public data published throughout the voting process on the blockchain.


Awesome!

Smart Contracts

Phew, it seems I've described a voting protocol that uses distributed technologies and cryptography to provide honest voters and organizers with all the necessary guarantees, protecting the collective decision-making process from internal and external attackers. Of course, this assumes "proper" decentralization — that is, distributing the system nodes among independent participants; without this, we're just creating a fancy Google Form :)


Now, let’s talk about one more interesting component of the system — smart contracts. In principle, they are not mandatory. Everything I've described so far could work through simple "registration" of data and external processing on the backend. But in that case, we would have a chaotic "basket" on the blockchain where anyone could throw in data — and it would be up to us to make sense of it all. Ideally, this "basket" should be smart: it should know who is allowed to send data, be able to verify the data format, or even validate the data’s compliance with the business logic of the vote.


This is the role that smart contracts play on the blockchain. They are the "smart box" for voting that defines the voting rules, stores the list of participants, and tracks the status model of the vote: which public key belongs to a voter, which to a cryptographic service. The smart contract also registers the public data of the DKG protocol, verifies and records submitted votes, and logs the voting results.

Conclusion

Unfortunately, not all system operations can be handled by a smart contract. First, it can only operate on public data — no private keys or passwords, which are critical for the DKG process. Second, these are distributed computations, meaning they are expensive and slow. Not all "heavy" cryptography can be embedded in a smart contract. However, this is where IT society must work on — optimizing cryptographic libraries and aiming for more complete automation of the voting process.


Another important direction is something developers, fascinated by the elegance of their technical solutions, tend to overlook: UX — user experience. It’s trendy and often appropriate to add blockchain, but try explaining to a user accustomed to the UX of modern mobile and web applications what their personal private key is and that they can’t do anything in the system without it.


The node is mining, the preloader is spinning, and the user is waiting, unsure what they’re even waiting for: "I don’t know; I send emails instantly, so why do I have to wait here?" That's why it is also very important to ensure the system retains familiar user features while preserving its technological advantages.


Finally, we must remember that every technical solution should address real-world needs and user pains. A blockchain voting protocol might be interesting to researchers and developers but not to businesses. Businesses want effective solutions to their problems. Voting systems must not only have the necessary technical characteristics but also fit into established business decision-making processes, comply with regulatory requirements, and so on.


I hope you found this article interesting.