This post gives a short overview of some of the arguments for and against sharding and our plan for “scaling” Tezos.
TL;DR: scaling isn’t really quadratic, a very high throughput can be sustained without significantly hurting decentralization, and a far more powerful technique is right around the corner anyway.
The blockchain model is often described as having every participant validate every transaction. If the number of transactions increases as the number of participants, then collectively the network will be performing a quadratic amount of validation in order to progress. It is often argued that, as long as this remains the dominant paradigm, no tweak can possibly represent a “scalability” improvement. This is used to justify “sharding” as a scaling technique, an approach which lets validators work on separate sub-chains which are braided together periodically. However, the idea of quadratic scaling is only correct in a narrow, theoretical, and asymptotic, sense but not in a practically relevant way.
First of all, a major bottleneck to transaction throughput in the Bitcoin blockchain is the requirement that validation time be very short compared to the block interval in order to avoid orphaning blocks. Proof-of-stake blockchains with a synchronous consensus model only require that the validation time be shorter than the block interval (some proof-of-work proposals like Bitcoin-NG are also immune to this problem). Assuming a validation speed of 17 seconds/MB, a one to two orders of magnitude increase in throughput can be gained. While every validator is still required to validate every transaction, a two orders of magnitudes gain is nothing to sneer at!
Second, and more importantly, this line of thought conflates participating in the network as a validator and using the network. Simple payment verification, a technique described in the original Bitcoin whitepaper allows use of the network without active validation while extending a minimal amount of trust. In general, the more independent parties validate the chain, the less likely they are to collude and corrupt the content of the ledger. A more cynical (or perhaps prudent) viewpoint would be that every party is fundamentally corruptible and thus that the security of the network increases linearly with the number of validators along with the cost of bribes. But even then, the security of the network is essentially a function of the number of independent validators, not a function of the number of participants. As the number of participant increases, the number of validators need not increase proportionally and may become a vanishingly small fraction of the network without a loss of security.
A more sophisticated argument is that freedom of entry in the set of validators is key to maintaining the validators honest. A stagnant set of validators can soon become a stable cartel if not swiftly outcompeted by hungry newcomers. If the computing requirements for validating the blockchain are substantial, it could create a technological moat protecting that cartel of validators. Ethereum’s stated goal for sharding its blockchain is that a “cheap laptop” should be sufficient to validate the lightweight sub-chains involved in the scheme. They are also hinting at an economic model where validators place bets to convince other validators they need not double check their work.
Let’s look at some actual numbers. Visa’s peak transaction rate is 4,000 transactions per second. An ed25519 verify operation takes 273,364 cycles. On a modern 3Ghz computer this means more than 4,000 signatures can be verified in 0.36s. Of course there is more to validating a blockchain than merely verifying signatures, but this tends to dominate the computational cost.
I may be too conservative in using Visa’s peak transaction rate. After all, the promise of microtransactions and the machine payable web suggest an increase in the demand for transactions. Let us boldly increase that rate one hundred fold. A back of the napkin calculation suggest that, conservatively, a rate of 400,000 transactions per second could be sustained today with a gigabit connection (which can be obtained for only a couple hundred of dollars a year in most OECD countries, the US sadly being a notable exception) and with a cluster of computers costing less than $20,000. Five years from now, this computing power will likely available for under $2,000. These are the numbers for a sustained transaction rate one hundred time greater than Visa’s peak transaction rate.
Even if we see a surge in the demand for transaction throughput, the cost of computing power and bandwidth will keep dropping precipitously. And of course, there are many off-chain scaling solutions such as payment channels which are much more sensible ways of implementing micropayments.
The engineering effort Ethereum is undertaking to shard its blockchain seems highly complex to me. To their greatest credit, they seem to be approaching the problem with the tools of formal verification, including a verification of the economic hypothesis around the cost of bribing validators. I applaud that approach (we place a great deal of importance on formal verification in Tezos, and have designed our architecture to facilitate it) but I wonder why they would undertake such a complex endeavor when the requirement to validate a high throughput are still reasonable, especially when compared to the costs involved in running a typical mining operation.
My read is that people in this space have overly romanticized chain validation and consensus participation. It is often said that Bitcoin is a “permissionless” network. To me, permissionless means that I can set up a website and start accepting bitcoins immediately. In particular, I do not need a bank’s approval to start receiving or sending bitcoins. It is a powerful source of financial freedom. However, many people have come to see the ability to mine as somehow the defining characteristic that makes Bitcoin “permissionless”. I find this puzzling. At best, we can argue that the mining algorithm enables permissionlessness but, in fine, the activity of interest is transacting and storing value, not hashing.
As we’ve said before, a low barrier to entry in validation is important in maintaining a healthy decentralized and censorship resistant network, but once this censorship resistance is achieved, the fact that I, as a user, can become a small time miner on the chain and marginally contribute is pragmatically irrelevant. At best I’ll be making a symbolic statement, at worst I’ll be making a gamble between electricity prices and bitcoin prices. Given the amounts that I typically transact, I hardly need to bother with anything more than a SPV level of security. Anyone transacting amounts worth fully validating the chain can well afford the cost to do so. The idea that chain validation must imperatively be accessible for dirt cheap seems to stem more from a misplaced egalitarian ideology than from actual economic or security arguments.
It turns out that a silver bullet for chain validation is right on the horizon and under active research: recursive SNARKs. SNARKs, which stands for succinct non-interactive zero-knowledge proofs of knowledge are the technology used in Zcash for protecting the privacy of transactions (if you’re already objecting that SNARKs require a trusted setup, please bear with us, we have good news for you). Beyond their privacy preserving benefits, SNARKs can be a massively powerful tool to increase scalability, and it is the one we hope to harness in future iterations of Tezos.
Here’s how this works: take a piece of computer code such as the code which validates a transaction sending tokens from A to B. From this code, you can generate a zero-knowledge proof that you know of a valid transaction which transfers tokens from A to B. Instead of submitting the transaction itself to a miner, you could merely submit the change in balances alongside with the proof. This proof can be verified very efficiently by any validator, in only a few milliseconds, regardless of the complexity of the transaction. This means that gas limits for smart contracts would become a thing of the past. You could run a smart contract on your machine, you could let it perform hours and hours of computation and then submit a tiny proof to the network that you did the calculation and the numbers do add up. This very counter intuitive possibility is a consequence of the PCP theorem. Rather than try to engage in economic “bets” that the transaction has been properly validated we can obtain true cryptographic assurance.
This is barely scratching the surface. It is possible to generate proofs that you have witnessed and validated… another proof (hence the name “recursive SNARK”). As a result, the block creator could aggregate all the proofs that they received and produce a single proof establishing they have seen all these other proofs. The transactions would still be entirely private, including to the block creator, but they would all be condensed into a single cryptographic proof. The block creator would then publish that one, single, proof alongside with the result of applying the transactions.
Pushing this further, we may imagine a blockchain where each block consists solely of the root hash of the content of the ledger alongside a proof that valid transactions (including smart contract transactions) have been made that moved the ledger into this new state and that the proof present in the previous block was itself valid.
In practice, this means that a user could sync with the blockchain from scratch and validate it all the way from the genesis block in less than one second.
SNARKs famously suffer from one drawback: they require a trusted setup which can be performed safely but can’t be audited after the fact. However, a different construct has recently been unveiled: STARKs. They share similar properties but do away with the issue of the trusted setup and rely on fewer mathematical conjectures. Icing on the cake: they’re also resistant to quantum computing. I recommend this excellent talk by Eli Ben Sasson which presents his research on the topic.
What’s the catch? Why don’t blockchains currently use this magic technique? Unfortunately, this method’s Achilles Heel is that proof generation still takes a considerable amount of time, more than would be required for the scheme I’ve described to be practical. Proof-generation of proofs that another proof has been validated (recursive SNARKs) can still take on the order of a minute. But research is very active in the area and proof generation speed has already considerably improved. STARK proof generation is already three times faster than SNARK’s. We are making a calculated bet that the speed will keep increasing rapidly in the coming few years. There does seem to be a few low hanging fruits to pick.
Among those low hanging fruits, we are particularly interested in the possibility of designing a programming language (programming language theory is our hammer…) that is both efficient to implement in the computational model of STARKs (polynomial iteration over a finite field) and adequate at describing the logic of smart contracts. Another possibility worth exploring would be to build dedicated hardware for this task.
In summary, here is our plan for scaling the Tezos transaction throughput: