Securely scaling blockchains is a significant problem to solve. Failing to do so could compromise real-world impact of public-blockchain technology. Here I’ll give a _light-talk_ about _Fraud Proofs_ and how they can help, but before some little introduction into scalability. ### Blockchain scalability mini-review If we dig a little in the numbers, we usually find that Visa and MasterCard have a transaction throughput at least 2–3 orders of magnitude higher than most public blockchains. Bitcoin has a transaction throughput of 6–7 transactions per second (tps), whereas Visa is within the range of 2,000–20,000 tps. These numbers are repeated all over the Internet so take them with a grain of salt; _accept_ the order of magnitude difference. Comparing Visa and Bitcoin throughput is quite unfair since their design priority orders are different. Public blockchains compromise throughput for decentralization. This order of priorities is by design. > In public blockchains decentralization is king. There are plenty of ideas on how to solve this problem which can be grouped depending on the layer of the blockchain-stack. Also, you can hear about _on-chain_ vs. _off-chain_ solutions. Examples of o_n-chain_ solutions include [increasing the block size](https://en.bitcoin.it/wiki/Block_size_limit_controversy) and [sharding](https://github.com/ethereum/wiki/wiki/Sharding-FAQs). On the other hand, examples of _off-chain_ solutions are _side-chains_, e.g., [Bitcoin Lightning Network](https://lightning.network/) or [Ethereum Plasma](https://plasma.io/). ### Fraud proofs A [recent paper](https://arxiv.org/pdf/1809.09044.pdf) from [Mustafa Al-Bassam](https://en.wikipedia.org/wiki/Mustafa_Al-Bassam), [Alberto Sonnino](https://sonnino.com/), and [Vitalik Buterin](https://es.wikipedia.org/wiki/Vitalik_Buterin) discusses a solution to reduce the ratio between the number of _light-nodes_ vs. _full-nodes_ without compromising decentralization and thus security. In their own words: > Fraud and data availability proofs are key to enabling on-chain scaling of blockchains (e.g., via sharding or bigger blocks) while maintaining a strong assurance that on-chain data is available and valid. The full details of the paper involve _Sparse Merkle Trees_, _Reed-Solomon_ as _erasure codes_, and _random-sampling_ analysis. If you don’t have a clue about this, don’t worry. The goal of this article is to explain the big picture in plain English mentioning how those tools fit in the puzzle. #### An analogy of Fraud Proofs Before getting into fraud proofs in the context of blockchains, it may be useful to go through the following mental exercise. Suppose you participate in a network where multiple peers share a database that contains big [prime numbers](https://en.wikipedia.org/wiki/Prime_number). Each node in this network works quite hard in adding new big prime numbers and sharing them with other peers. If we are a peer in this network, we have two options: 1. Trust that new numbers are prime numbers. 2. For every new prime number candidate, run some primality test to be sure that our database is consistent. If the network is comprised of trusted peers, then choosing the first option is quite attractive, since running primality tests to big prime number candidates are costly. Additionally, this problem gets worse if the rate of new numbers to test is higher than our computing power so we may fall behind. On the contrary, if we network is trustless, then choosing the first option is risky. What if some malicious peer tries to trick the network and commits a big _composite number_? Having _composite numbers_ in our database is unacceptable. What if we know that other peers can send us proofs that some number in our database isn’t prime? For any given number _n_, the proof would imply sending a single number less than _n_ and greater than one that divides _n._ This idea would have the following advantages: * We could avoid running primality tests for every new number in the database. If after some time threshold we don’t receive a fraud-proof, we can consider the number is prime. * If at least one peer is checking the numbers, it could alert the whole network about newly non.prime added numbers. * The size of the proof is small. * Verifying the proof is fast. * We could penalize the peer that added that number to the database for his misbehavior. This idea is similar to fraud proofs. Instead of validating all the rules of the system, others can demonstrate that the current system state is wrong. #### Light-nodes vs. Full-nodes To understand why fraud-proofs have value in blockchains, we should consider an important fact. To validate a new block we should obtain the data corresponding to the block, and then run the full validation ourselves. We can think about blockchains as state-machines. The current state of the blockchain corresponds to the state after processing the last known block. When a new block appears, we process its transactions and define the new current state. The nodes that do this work are called _full-nodes_. Other participants in the network may not accept to be _full-nodes_. If a node runs on an Android device, it has limited resources, such as computing power, storage, and bandwidth. These nodes are named _light-nodes_ since they only download block headers instead of all the data associated with it. _Light-nodes_ can do some minimal validation with the little information held in block headers, e.g., checking _proof-of-work_ of the block. _Proof-of-work_ checking doesn’t mean that transactions are valid, nor the new state. They trust that the majority of _full-nodes_ are honest since they are the ones that are making the real validation. As mentioned before, increasing block size is a proposed on-chain solution to the scaling problem. If we increase the maximum amount of transactions in each block, then the throughput will be higher. On the other hand, it pressures _full-nodes_ on committing more resources to not falling behind. Nobody wants to buy a high-end laptop or a supercomputer to run a _full-node_; thus there will be fewer _full-nodes_. The preceding implies that most _light-nodes_ are trusting a smaller set of _full-nodes_ which may incentivize them to act dishonestly. **Increasing the block-size isn’t absurd, but it doesn’t scale securely.** What if a _light-node_ receive a proof that a block is invalid without trusting the majority of _full-nodes_?. If we read the paper we can find that it talks about two kinds of _fraud-proofs_: * _State transition fraud-proofs._ * _Data availability fraud proofs_. Each of them tries to construct a proof that some undesired situation has happened. ### State transition Fraud Proofs Let’s come back to the idea of blockchains as state-machines. New states are created after all transactions in a block are applied. The first core concept described in the paper is including _intermediate states_ between transactions. Instead of only having a single Merkle root at the end of each block, we could add some intermediate Merkle root between every transaction. If a _full-node_ detects that a new block has an invalid state transition, it could tell a _light-node_ the following: 1. Hey, I found that this block is wrong. 2. After processing _n-1_ transactions from the beginning of the block, if I process the next transaction, the Merkle root of the state is different from the following _intermediate state_ said by the miner. 3. I’m sending you a bunch of Merkle proofs with the minimal data needed so you can check by yourself. As in many parts of most blockchains, Merkle proofs are a crucial tool for proving that some data corresponds to a block. The _light-node_ only have to trust the block header it already has. We don’t need to save _intermediate states_ between each transaction. An option is saving them every two, five or more transactions. Doing this would save space in the state tree, but make fraud proofs bigger since more Merkle proofs are necessary between states. Fraud proofs like this are powerful since only one honest _full-node_ can prove to any number of _light-nodes_ that a block is invalid; no majorities are needed. If after some time threshold a _light-node_ doesn’t receive a fraud-proof, it seems reasonable that the block is valid. Including economic incentives and penalizations for valid and invalid fraud proofs respectively is essential for desired full-nodes behavior. ### The Data Availability problem Fraud proofs for state transitions are great, but they rely on a critical assumption: all the block data is available. If the block miner only publishes the block header without the corresponding correct data, then it can’t be proven wrong. The same way I can’t show that a sum of two numbers is wrong if I don’t know them. Moreover, even if 99% the data is available, the remaining 1% may be essential to prove that a block is invalid. We need 100% data availability. For block validation, this is a strict requirement, since data could be unavailable for multiple reasons, not only intentionally by malicious nodes. The correct rationale is making data unavailability hard for a malicious node. > This converts a 100% data availability problem into a 50% data availability problem. — Vitalik Buterin The proposed solution uses [Erasure Codes](https://en.wikipedia.org/wiki/Erasure_code), in particular, multi-dimensional [Reed-Solomon](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction) codes. In a nutshell, this means transforming the block data from M chunks to N chunks (M<N), so that any M chunks of N are enough to reconstruct the original data. In particular, the paper establishes N=2\*M. The multi-dimensional characteristic allows reconstructing partial data without the need to restore all the M chunks. Moreover, this partial data has its own Merkle root that will be useful to prove if the data is correctly available (more on this later). The paper provides a more helpful way to see this idea:  Page 13 of the paper This powerful tool together with pulling data using _random-sampling_ provides strong mathematical guarantees for the reconstruction of the original data. We can find a full mathematical justification of this in the paper, but here’s an extract of an important conclusion:  Page 21 of the paper In plain English, say you have 1000 _light-nodes_ doing the _random-sampling_ of the extended chunks of data. Increasing the sample-size will ensure that, eventually, they will ask for a chunk that the malicious party won’t like to share. Thus giving more certainty about detecting a data-availability problem. In the paper, they also discuss another kind of fraud proofs within data availability, in particular proving that the miner generated the Reed-Solomon codes incorrectly. Since we need M out of N chunks to reconstruct a partition of block data, this may be enough to detect if the N chunks match the corresponding Merkle root of the partition. ### Conclusion Fraud proofs and erasure codes are useful tools for scaling public blockchains. They empower _light-nodes_ to have their own opinion in discarding blocks without trusting a majority of honest _full-nodes_. Moreover, solving the data-availability problem not only allows fraud proofs to be constructed but also addresses other issues: > [Even if succinct zero knowledge proofs can be used to verify correctness, an attacker getting away with publishing unavailable blocks and getting them included in the chain is still very bad, as such a thing happening denies all other validators the ability to fully calculate the state, or to make blocks that interact with the portion of the state that is no longer accessible.](https://github.com/ethereum/research/wiki/A-note-on-data-availability-and-erasure-coding) Shortly, * Merkle proofs are essential for proving that data belongs to a block. * _Intermediate states_ significantly reduce the amount of Merkle proofs needed to prove that state transitions are wrong. * Being able to construct the Merkle proofs naively implies 100% data availability guarantees, which isn’t desirable. * Erasure codes plus random-sampling give strong mathematical guarantees that all the data needed from the block will be available, in particular, if the miner isn’t malicious. * A miner trying to trick the network by sharing wrong information can be detected thanks to the multi-dimensional characteristic of Reed-Solomon codes plus Merkle roots of partial data. The mentioned paper, as well as the paper-related post, also indicate that [zk-SNARKS/zk-STARKS](https://en.wikipedia.org/wiki/Non-interactive_zero-knowledge_proof) are other options for proving correct state transitions and data-availability. But those are different and exciting monsters. Most of what I mentioned in this article is a simple summary of the full idea within those resources. I learned a lot from reading them, so I encourage you to do the same!