Securely scaling blockchains is a significant problem to solve. Failing to do so could compromise real-world impact of public-blockchain technology. Here I’ll give a about and how they can help, but before some little introduction into scalability. light-talk Fraud Proofs Blockchain scalability mini-review If we dig a little in the numbers, we usually find that Visa and MasterCard have a transaction throughput at least 2–3 orders of magnitude higher than most public blockchains. Bitcoin has a transaction throughput of 6–7 transactions per second (tps), whereas Visa is within the range of 2,000–20,000 tps. These numbers are repeated all over the Internet so take them with a grain of salt; the order of magnitude difference. accept Comparing Visa and Bitcoin throughput is quite unfair since their design priority orders are different. Public blockchains compromise throughput for decentralization. This order of priorities is by design. In public blockchains decentralization is king. There are plenty of ideas on how to solve this problem which can be grouped depending on the layer of the blockchain-stack. Also, you can hear about vs. solutions. on-chain off-chain Examples of o_n-chain_ solutions include and . On the other hand, examples of solutions are , e.g., or . increasing the block size sharding off-chain side-chains Bitcoin Lightning Network Ethereum Plasma Fraud proofs A from , , and discusses a solution to reduce the ratio between the number of vs. without compromising decentralization and thus security. recent paper Mustafa Al-Bassam Alberto Sonnino Vitalik Buterin light-nodes full-nodes In their own words: Fraud and data availability proofs are key to enabling on-chain scaling of blockchains (e.g., via sharding or bigger blocks) while maintaining a strong assurance that on-chain data is available and valid. The full details of the paper involve , as , and analysis. If you don’t have a clue about this, don’t worry. Sparse Merkle Trees Reed-Solomon erasure codes random-sampling The goal of this article is to explain the big picture in plain English mentioning how those tools fit in the puzzle. An analogy of Fraud Proofs Before getting into fraud proofs in the context of blockchains, it may be useful to go through the following mental exercise. Suppose you participate in a network where multiple peers share a database that contains big . Each node in this network works quite hard in adding new big prime numbers and sharing them with other peers. prime numbers If we are a peer in this network, we have two options: Trust that new numbers are prime numbers. For every new prime number candidate, run some primality test to be sure that our database is consistent. If the network is comprised of trusted peers, then choosing the first option is quite attractive, since running primality tests to big prime number candidates are costly. Additionally, this problem gets worse if the rate of new numbers to test is higher than our computing power so we may fall behind. On the contrary, if we network is trustless, then choosing the first option is risky. What if some malicious peer tries to trick the network and commits a big ? Having in our database is unacceptable. composite number composite numbers What if we know that other peers can send us proofs that some number in our database isn’t prime? For any given number , the proof would imply sending a single number less than and greater than one that divides This idea would have the following advantages: n n n. We could avoid running primality tests for every new number in the database. If after some time threshold we don’t receive a fraud-proof, we can consider the number is prime. If at least one peer is checking the numbers, it could alert the whole network about newly non.prime added numbers. The size of the proof is small. Verifying the proof is fast. We could penalize the peer that added that number to the database for his misbehavior. This idea is similar to fraud proofs. Instead of validating all the rules of the system, others can demonstrate that the current system state is wrong. Light-nodes vs. Full-nodes To understand why fraud-proofs have value in blockchains, we should consider an important fact. To validate a new block we should obtain the data corresponding to the block, and then run the full validation ourselves. We can think about blockchains as state-machines. The current state of the blockchain corresponds to the state after processing the last known block. When a new block appears, we process its transactions and define the new current state. The nodes that do this work are called . full-nodes Other participants in the network may not accept to be . If a node runs on an Android device, it has limited resources, such as computing power, storage, and bandwidth. These nodes are named since they only download block headers instead of all the data associated with it. full-nodes light-nodes can do some minimal validation with the little information held in block headers, e.g., checking of the block. checking doesn’t mean that transactions are valid, nor the new state. They trust that the majority of are honest since they are the ones that are making the real validation. Light-nodes proof-of-work Proof-of-work full-nodes As mentioned before, increasing block size is a proposed on-chain solution to the scaling problem. If we increase the maximum amount of transactions in each block, then the throughput will be higher. On the other hand, it pressures on committing more resources to not falling behind. full-nodes Nobody wants to buy a high-end laptop or a supercomputer to run a ; thus there will be fewer . The preceding implies that most are trusting a smaller set of which may incentivize them to act dishonestly. full-node full-nodes light-nodes full-nodes Increasing the block-size isn’t absurd, but it doesn’t scale securely. What if a receive a proof that a block is invalid without trusting the majority of ?. light-node full-nodes If we read the paper we can find that it talks about two kinds of : fraud-proofs State transition fraud-proofs. . Data availability fraud proofs Each of them tries to construct a proof that some undesired situation has happened. State transition Fraud Proofs Let’s come back to the idea of blockchains as state-machines. New states are created after all transactions in a block are applied. The first core concept described in the paper is including between transactions. Instead of only having a single Merkle root at the end of each block, we could add some intermediate Merkle root between every transaction. intermediate states If a detects that a new block has an invalid state transition, it could tell a the following: full-node light-node Hey, I found that this block is wrong. After processing transactions from the beginning of the block, if I process the next transaction, the Merkle root of the state is different from the following said by the miner. n-1 intermediate state I’m sending you a bunch of Merkle proofs with the minimal data needed so you can check by yourself. As in many parts of most blockchains, Merkle proofs are a crucial tool for proving that some data corresponds to a block. The only have to trust the block header it already has. light-node We don’t need to save between each transaction. An option is saving them every two, five or more transactions. Doing this would save space in the state tree, but make fraud proofs bigger since more Merkle proofs are necessary between states. intermediate states Fraud proofs like this are powerful since only one honest can prove to any number of that a block is invalid; no majorities are needed. If after some time threshold a doesn’t receive a fraud-proof, it seems reasonable that the block is valid. full-node light-nodes light-node Including economic incentives and penalizations for valid and invalid fraud proofs respectively is essential for desired full-nodes behavior. The Data Availability problem Fraud proofs for state transitions are great, but they rely on a critical assumption: all the block data is available. If the block miner only publishes the block header without the corresponding correct data, then it can’t be proven wrong. The same way I can’t show that a sum of two numbers is wrong if I don’t know them. Moreover, even if 99% the data is available, the remaining 1% may be essential to prove that a block is invalid. We need 100% data availability. For block validation, this is a strict requirement, since data could be unavailable for multiple reasons, not only intentionally by malicious nodes. The correct rationale is making data unavailability hard for a malicious node. This converts a 100% data availability problem into a 50% data availability problem. — Vitalik Buterin The proposed solution uses , in particular, multi-dimensional codes. In a nutshell, this means transforming the block data from M chunks to N chunks (M<N), so that any M chunks of N are enough to reconstruct the original data. In particular, the paper establishes N=2*M. The multi-dimensional characteristic allows reconstructing partial data without the need to restore all the M chunks. Moreover, this partial data has its own Merkle root that will be useful to prove if the data is correctly available (more on this later). Erasure Codes Reed-Solomon The paper provides a more helpful way to see this idea: Page 13 of the paper This powerful tool together with pulling data using provides strong mathematical guarantees for the reconstruction of the original data. We can find a full mathematical justification of this in the paper, but here’s an extract of an important conclusion: random-sampling Page 21 of the paper In plain English, say you have 1000 doing the of the extended chunks of data. Increasing the sample-size will ensure that, eventually, they will ask for a chunk that the malicious party won’t like to share. Thus giving more certainty about detecting a data-availability problem. light-nodes random-sampling In the paper, they also discuss another kind of fraud proofs within data availability, in particular proving that the miner generated the Reed-Solomon codes incorrectly. Since we need M out of N chunks to reconstruct a partition of block data, this may be enough to detect if the N chunks match the corresponding Merkle root of the partition. Conclusion Fraud proofs and erasure codes are useful tools for scaling public blockchains. They empower to have their own opinion in discarding blocks without trusting a majority of honest . light-nodes full-nodes Moreover, solving the data-availability problem not only allows fraud proofs to be constructed but also addresses other issues: Even if succinct zero knowledge proofs can be used to verify correctness, an attacker getting away with publishing unavailable blocks and getting them included in the chain is still very bad, as such a thing happening denies all other validators the ability to fully calculate the state, or to make blocks that interact with the portion of the state that is no longer accessible. Shortly, Merkle proofs are essential for proving that data belongs to a block. significantly reduce the amount of Merkle proofs needed to prove that state transitions are wrong. Intermediate states Being able to construct the Merkle proofs naively implies 100% data availability guarantees, which isn’t desirable. Erasure codes plus random-sampling give strong mathematical guarantees that all the data needed from the block will be available, in particular, if the miner isn’t malicious. A miner trying to trick the network by sharing wrong information can be detected thanks to the multi-dimensional characteristic of Reed-Solomon codes plus Merkle roots of partial data. The mentioned paper, as well as the paper-related post, also indicate that are other options for proving correct state transitions and data-availability. But those are different and exciting monsters. zk-SNARKS/zk-STARKS Most of what I mentioned in this article is a simple summary of the full idea within those resources. I learned a lot from reading them, so I encourage you to do the same!