An engineer’s guide to ETH2.0
What is ETH2.0?
ETH2.0 is the planned replacement for Ethereum. Over the next several years, ETH2.0’s designers intend to completely subsume Ethereum’s consensus system and state altogether. With such a broad scope, we can’t say precisely what ETH2.0 will or will not include. We do have a few specs, and quite a few teams working on early implementations. At this point, the ETH2.0 designers tentatively plan to include sharding, Casper, state rent, and an eWASM VM. Initial client testing is underway, and a feature-light ETH2.0 testnet is expected to launch within three months (Q1 2019). At first, ETH2.0 will source its Ether (but not its security) from the main Ethereum chain, but designers eventually plan to invert the relationship by making ETH2.0 the main chain, and Ethereum 1.X a shard chain under its management.
So what does this mean for engineers?
If you’re a Solidity or Dapp developer hoping to deploy ETH2.0 smart contracts, expect a lot of changes. ETH2.0 is a complete replacement for Ethereum and will change many of the assumptions we make when writing smart contracts. Its planned multi-year phased rollout bears more resemblance to a product release cycle than an upgrade cycle. The tools and contracts we’ve written for ETH1.X will likely need to be completely redesigned and rewritten for ETH2.0. Fortunately, we have a few years to prepare the ecosystem. To help get the ball rolling, I’d like to discuss the current roadmap and cover some of the engineering ramifications.
Currently, the sharding roadmap (which doubles as the ETH2.0 roadmap) has seven phases listed. Only Phase 0 has a fleshed-out specification, which receives regular updates. The Phase 1 specification is much less precise and does not seem to be under active development yet. After Phase 1 the roadmap becomes a list of goals, rather than technical document. In Phase 2, for example, the roadmap links to ethresear.ch three times more than to github. Because anything further out looks like speculation instead of engineering, our concrete discussion is limited to Phases 0, 1, and 2, and we’ve included several rough outlines of possible directions for later phases.
Phase 0 — The Beacon Chain
Phase 0 introduces the “beacon chain.” ETH2.0 designers intend the beacon chain to become the hub of ETH2.0’s ecosystem, becoming the root source of security and validation for all other shards. Once deployed, the beacon chain will run proof of stake using Casper the Friendly Finality Gadget (“Casper FFG”). This early iteration of the beacon chain is designed to be as simple as possible, which is why Phase 0 will not support smart contracts, accounts, asset transfers, and will not include any shards. Ether on the beacon chain will not be transferable on-chain, which means users will not be able to deposit it in exchanges.
BETH: The New Ether
Beacon ETH (BETH) is a new asset used solely by stakers (‘validators’) on the beacon chain. It is created via two methods: 1) as a reward for validating the beacon chain (and shards, after Phase 1), and 2) BETH can be purchased for 1 ETH by any ETH1.X user via an ETH1.X contract. The contract refers to this as a “deposit.” Engineers may notice that the contract does not have a withdrawal function. This is because there is no way to withdraw BETH from the beacon chain in phase 0. Which is to say, once deposited in the ETH1.X validator registration contract, the ETH1.X Ether is effectively burned. Beacon chain validators watch this contract and submit deposit information to the beacon chain, which will issue new BETH to depositors. Therefore we expect new BETH to be issued on the beacon chain shortly after ETH is sent to the validator registration contract. Temporary censorship of a deposit is possible, but permanent censorship is unlikely to occur under Casper’s rules.
Ether transfers on the beacon chain will not be allowed until Phase 2, and I don’t believe there will be any way to move BETH back to ETH1.X until 1.X is completely folded into the shard ecosystem. Given that Phase 0 is incomplete, and no reliable Phase 1 specification exists, it seems reasonable to assume that BETH will persist as an independent and non-transferable asset for at least two years. Once Phase 2 is complete, BETH will be transferable to and from shards; however, ETH will not be. This is unlikely to pose significant economic difficulties. In the past, pre-launch and low-feature tokens like BETH have been traded on exchanges via IOUs. For example, the HitBit and BitMEX XTZ futures markets launched during the Tezos crowdsale. If there is demand for BETH, we should expect to see a vibrant ecosystem of exchanges supporting custodial BETH trading and staking. However, demand for BETH seems questionable. BETH makes a poor investment, as the one-way peg from ETH to BETH gives BETH a price ceiling of 1 ETH. Which is to say BETH can never be worth more than ETH but can be worth less.
Phase 0+ — Staking
Users may stake 32 BETH on the beacon chain to become a validator. In Phase 0, validators will manage only the beacon chain. From Phase 1 onward validators will also manage 1,024 shard chains. The beacon chain (and each shard chain) will use Casper FFG to finalize blocks. FFG is a Proof of Stake algorithm implementing stake slashing for bad behavior like chain halts and censorship. Astute readers will have noticed FFG’s cousin, Casper CBC, in the “Ethereum 3.0” section of the sharding roadmap — while a full discussion of FFG (and certainly of CBC!) is outside of the scope of this post, I’d recommend reading Vitalik’s note on hybrid PoW/FFG, his medium post on minimal slashing conditions and the FFG paper.
What do stakers do?
Sharding aims to split (shard) state information across nodes, without requiring any node to have a full picture of the network. Therefore no validator will validate all shards. Instead, the beacon chain will coordinate validation of all other shards, and all validators will validate the beacon chain. Each epoch (64 blocks or about 6.4 minutes), the beacon chain will shuffle the validators and assign them to a shard. A group of validators assigned to a shard is called a committee. Committees target 128 members. In Phase 0, this means every 6 minutes the beacon chain will select available validators to form a committee for the next 6 minutes. In Phase 1, the beacon chain will appoint a committee of validators for each of 1,024 shards. The precise method for this is complex. It involves a multi-phase random number generation process as well as a verifiable delay function to further frustrate attempts to manipulate the committee selection process.
ETH2.0 selects committees randomly and rotates committees often because of their critical work. Committees are responsible for preserving the safety, liveness, and integrity of their shard, as well as attesting to the shard state on the beacon chain. They are the only way the beacon chain can learn the state of shards and vice versa. Selecting them randomly from the pool of all validators minimizes the chance that the committee as a whole will lie or cheat. Rotating them often aims to mitigate the harm that a bad committee can cause. In other words, it should be difficult for malicious or profit-maximizing validators to use committee selection as a tool to attack any part of the network. Moreover, should they gain control of a shard committee through chance, they will control it for no more than 64 blocks.
Proof-of-Stake for engineers
While documentation of the philosophical differences between ETH1.X’s Proof of Work and ETH2.0 Proof of Stake is an ongoing process, it is worth noting that some of the PoW/PoS feature disparity does affect engineers directly. For example, while PoW chains support stateless SPV proofs and NiPoPow-summarized tracking of remote state, PoS forbids any low-state communication. Subjectivity prevents state-light attestations. In other words, a remote state proof on Proof-of-Stake will consist of roughly the same amount of data as a PoW stateless SPV proof but requires prior validation of the entire PoS history. Stateless SPV proofs, in contrast, need no other information to validate. This means that cross-shard or cross-chain applications have reduced functionality and increased overhead in a subjective Proof of Stake environment.
Phase 1 — Sharding
Phase 1 aims to create consensus about the contents of shard chains, but not about their meaning. In other words, it’s a trial run for the sharding structure, rather than an attempt to use shards to scale. The beacon chain will treat shard chain blocks as simple collections of bits with no structure or meaning. Shard chains will not yet have accounts, assets, or smart contracts. Shard validators, who are randomly selected by the beacon chain for each shard at each epoch, merely come to agreement on each block’s content. It doesn’t matter what information appears in shards blocks, so long as all committees reach consensus and update the beacon chain on the shard regularly.
Shard validators attest to the shard’s contents and state through a process known as crosslinking. Simply put, the committee must include verifiable information about the shard (like a Merkle root) in the beacon chain. In Phase 2 or later, crosslinking will support cross-shard communication. Once the beacon chain has received evidence of a given crosslink’s accuracy from multiple committees, the beacon chain can trust that the crosslink is a truthful representation of the shard without validating the entire shard. If committees disagree on the validity of a crosslink, clearly one committee is faulty and should be slashed. This is the root of security for all shards: misbehavior by their validators will eventually be found and punished by the beacon chain.
Phase 1 doesn’t have anything particularly interesting in it. Fundamentally it’s a bootstrapping phase for crosslinking, and the symmetric mechanism by which shards reference the beacon chain. The designers seem confident that these mechanisms will work. The major open questions revolve around specification and implementation strategy. Given that Phase 0 has taken approximately over a year to reach a reasonable level of specification, I would expect Phase 1 to take a similar amount of time. Interestingly, Phase 0 implementation has happened concurrently with specification. Even today, less than three months from testnet, the Phase 0 specification changes regularly. This implies that future ETH2.0 phases will have extremely high variance in development time. While optimists have told me six months, it is easy to see Phase 1 taking 12–18 months of development after Phase 0 enters testing.
Phase 2 — Smart Contracts
Phase 2 finally brings a system resembling the Ethereum we’re familiar with. With the release of Phase 2, shard chains transition from simple data containers to a structured chain state. This is when BETH will become transferable and smart contracts will be reintroduced. Each shard will manage a virtual machine based on eWASM (we’ll call it “EVM2”). We expect EVM2 to support accounts, contracts, state, and other abstractions that we’re familiar with from Solidity. However, massive behind-the-scenes changes are likely to break most existing tools. Fortunately, the eWASM team has done some groundwork for solc, truffle, and ganache. We can expect to see most familiar tools ported to support EVM2 before or during Phase 2’s testnet.
State rent, a very likely inclusion for Phase 2, poses some interesting challenges to present-day Solidity engineers. Rather than being able to store code and data indefinitely, state rent would require contract developers and users to pay for EVM2 storage over time. This prevents state bloat, by ensuring that unused information falls out of the state over time. The goal is to make the user, rather than the full node, pay for the costs of state. Many different models have been suggested, with no clear winner.
Interestingly, with some Ethereum upgrade plans and prominent Ethereum core devs recommending it, state rent may be the only overlap in the disparate roadmaps. Therefore I would strongly recommend planning to pay state rent on currently deployed contracts, and designing models to pass state rent to users in the future. We don’t know the precise design of state rent, but we should plan for the costs.
Beyond that, we don’t know what to expect from Phase 2. It’s still in very early stages of research and includes several major unsolved problems. Given the informal specification and development process, as well as Phase 2’s expanded scope over Phase 1, it doesn’t seem reasonable to suggest that Phase 2 could launch before 2020. Which is to say, while ETH2.0 may launch this year, don’t expect ETH2.0 to support asset transfer or smart contracts until at least 2020.
Phase 3 — Off-chain state storage
Now, in order to talk more about smart contracts, we’ll be skipping over Phase 3 almost entirely. Phase 3 minimizes on-chain state by moving as much as possible off-chain. Rather than the chain storing the entire state, it will store some state information, and an aggregator (aggregators are short bits of data that represent long lists of data; a Merkle tree is a kind of aggregator). Users will be responsible for storing the full state off-chain. When a user wants to interact with the state, they include a proof of the current state with their transaction. That way the resource requirement of running a validating node can be much lower. Several aggregator designs with different properties and performance characteristics are known, but no approach has been selected. At this point we stop being able to leverage on-chain communication to coordinate users, so we have to plan to sync state via some other system. Events become less useful to engineers here, as the chain no longer guarantees data availability. In Phase 3 maintaining and retrieving off-chain state will become a critical design constraint for dapps.
Phase 4 — Sharded Contracts
However, one insurmountable problem remains: ETH2.0 contracts, while they will be as powerful as Ethereum contracts, are bound to a single shard and can never directly interact with contracts on another shard. This is a direct consequence of sharding. Sharding’s goal is to split state up between shards, and not require direct knowledge of other shards. It achieves scale by splitting state and minimizing the load on any validator. Direct interaction requires direct knowledge. By design, a shard does not have direct knowledge of other shards. It learns about other shards only via cross-links to the beacon chain. Therefore whenever we want to interact cross-shard, we have to wait for the beacon chain. Concretely, this means that if SafeMath is deployed on Shard A, users on Shard B will either have to wait to access it or deploy a new SafeMath on Shard B.
Simple utilities like SafeMath will be deployed to each shard — 1024 SafeMaths on 1024 shards — but what about marketplaces like Maker or Compound? #DeFi’s promise of composable finance becomes challenging to keep across shard boundaries. A long delay between the opening of a CDP and the receipt of DAI can cause unacceptable financial losses. What if the market moves and the CDP is liquidated before the user ever receives DAI? In practice, this likely means that users will have accounts on every shard containing a compelling smart contract, and cross-shard composition is lost entirely. Maker and 0x can interact only if they are both deployed on the same shard, and the 0x users also have assets on that shard.
Fundamental trade-offs: synchrony or scale
ETH2.0 designers do not know what the cross-shard communication system will look like. From reading many proposals, it appears that there is a fundamental trade-off between immediate feedback and predictability. The nature of sharding can’t change: users must wait for cross-shard communication no matter what. However, we can couple the local and remote execution phases of the transaction on each shard tightly or loosely.
A tight coupling puts the waiting first. The transaction does nothing until the shards have communicated. In contrast, we can loosely couple transactions by executing part now and part later. The transaction executes on the local shard and then executes on the remote shard after the cross-shard communication. Loose coupling presents a better face to the user. They see their transaction’s local execution immediately and know that remote execution will occur at some point in the future. Unfortunately, they cannot know the outcome of a loosely-coupled transaction’s remote phase without waiting. Tightly coupled transactions are more predictable. The user knows more about the outcome because the remote state doesn’t shift between the local and remote execution phases. However, tight coupling requires the user to wait before seeing any result.
We have very little information about ETH2.0’s communication model. We know that it can’t provide cross-shard contract calls without sacrificing almost all scaling benefits. I won’t blame you if you stop reading here, as Phase 4 only has a mind map and a few vague links. A non-obvious consequence of this is that ETH2.0 will not provide significant scaling benefits to complex smart contract systems until Phase 4. Until then, contracts wishing to interact with other contracts must cohabitate a shard and are limited to the speed and scale of that shard. We expect shards to have at best a small constant factor speedup compared to ETH1.X. This means there will be little reason to migrate smart contract code or users until Phase 4 is released, potentially in the mid-2020s, as the advantage will be small. In the meantime, to better understand the trade-offs for engineers and dapp users, I’ve investigated a few proposed models and included short descriptions here. I don’t think any of these will be adopted, but I believe they are helpful for understanding the trade-offs involved. Again: everything below here is speculative.
A Basic Model: Receipts and Proofs
All forms of cross-chain communication leverage the beacon chain. Because the beacon chain commits to the state of all shards, and each shard commits to the state of the beacon chain, we can use it as a hub in the shard chain ecosystem. Messages from one chain to another must in some sense transit through the beacon chain. We don’t want to send the full message, because that would require the beacon chain to process each transaction itself, negating scaling benefits entirely.
Instead, whenever a user or contract on Shard A wants to interact with Shard B, we have Shard A generate a “receipt” with the message. Shard A commits to all of its receipts in its block header. The beacon chain waits for A to finalize and then commits to A’s header (including the commitment to the receipt). Shard B must wait for beacon finalization and then commit to the beacon header. Once this has happened, a new transaction can be submitted to B, including the receipt and a proof. The proof shows that the receipt was included in A, that A was included in the beacon, and that the beacon was included in B. This way the contracts on B can trust the message sent from A. If the contracts on B want to send a response back (maybe a return value, or an error), we repeat the whole process in reverse: Shard B makes a receipt that eventually finds its way back to Shard A.
It’s easy to see why this process takes time. Each of the four steps of communication require waiting several minutes for finalization! Unfortunately, we can’t avoid the waiting entirely. If we want to be sure of the remote state, then we have to wait for finality at every step. The best case for round-trip communication is four finality cycles. That said, the user gets confidence after three cycles because the user can see Shard B’s receipts before Shard A can see them. With ETH2.0’s 6.4 minute epoch length users must wait 19 minutes to see the outcome and 26 minutes to get the result on-chain.
Concrete Receipts: Token Migration Between Shards
ERC20 tokens’ versatility has made them ubiquitous in Ethereum today. However, ETH2.0 poses some logical problems for tokens. Because a smart contract manages all token balances, and that smart contract exists only on a single shard, tokens from Shard A don’t exist at all on Shard B. However, with some clever cross-shard communication, we can deploy the same token on several shards and allow migration of tokens between shards — effectively making a two-way peg between token contracts.
The scheme is pretty simple: when deploying our token (let’s call it “Cool Cross-shard Token” or “CCT”), we’ll use ERC20 with two small additions:
migrateReceive functions. We’ll have
migrateSend burn tokens, and generate a receipt. The receipt will include the number of tokens burned, and the receiving shard. We’ll have
migrateReceive validate the receipt and mint the same number of CCT. Then we’ll deploy the same token contract on each shard. Now we can effectively migrate tokens between shards by calling
migrateSend to burn on one, and then calling
migrateReceive to mint on the other. We will need to redeploy our token contract on each shard, but that seems worth it. Migrations, being one-way, take at least two finality periods of cross-shard communication. So after we call
migrateSend it will be about 10 minutes before our CCT are usable on the receiving shard.
Receipts are a general way of moving information across shards. We can put just about any on-chain information in a receipt. This includes entire smart contracts. Yanking is a proposal to migrate contracts across shards by including the contract’s code and storage in a receipt. The contract would be deleted (“yanked”) from Shard A and then redeployed on Shard B after the receipt makes its way over there. Once on Shard B, it can communicate directly with Shard B’s contracts and interact with Shard B’s state. It could even be yanked back to Shard A.
This would allow any smart contract to communicate with any other (after the cross-shard wait time). Unfortunately, because the receipt includes the whole contract and all its storage, it can get costly to move large or popular contracts. And while the receipt is in transit, the contract is entirely unusable. It has been yanked from Shard A but hasn’t yet reached Shard B. This means that all other users are locked out of that contract until it reaches Shard B. And even then, only users already on Shard B can interact with it. As a result, yanking is most suited to small contracts with few users. It makes tightly-coupled execution possible but is far from a general solution.
From here we move to more exotic constructions. Receipts are designed to make asynchronous (loosely-coupled) communication possible. However, we may want synchronous communication as well. For that, we have to get a bit more creative. Shard pairings are a simple design that gets us tightly coupled execution with minimal fuss.
Shard pairing is a simple scheme. Described in the third paragraph of this post, we shuffle shards into synchronous pairs at each height. Every time a shard is paired with another, users of either shard can make execute tightly coupled state updates across them. This means that if Shards A and B are paired at height 7, all validators of A and B must know all state of A and B, and the shards must advance together or not at all. In this model, if you need a cross-chain transaction between A and B, you would wait for A and B to be randomly paired. Vitalik describes the 100 shard case. With 1,024 shards, we expect it to take 512 blocks — about one hour — but because pairings are random, it could take much longer or much shorter. As Vitalik notes, this scales very poorly when you want to interact with multiple shards.
This is a broader version of pairings. Each epoch we split shards into a few “zones” composed of multiple shards. Zones must proceed synchronously, which means that all shards in a zone update their local state together. By proceeding synchronously, zones provide free movement between shards and direct interaction with any contract in the zone, but no advantage for communication with any shard outside your zone. In addition, because zones require validators to know the state of all shards in the zone, they negate many of the scaling advantages of sharding. If a zone is composed of 16 shards, we sacrifice roughly 15/16 (=94%) of the scale advantage while gaining tight coupling of execution with only 15/1,024 (=1%) of the total network.
A non-obvious property of cross-shard (and cross-chain) communication is that users reach confidence in a message faster than the chains involved. Alice, sending 5 BETH from Shard A to Shard B knows that it will arrive as soon as she sends it. Bob, seeing the send knows that the BETH will reach Shard B as soon as the send is finalized on Shard A. Shard B and its contracts, however, must wait several minutes for the beacon chain to reach finality on Shard A’s finalization. This implies that a sophisticated optimistic wallet can accept and spend funds on Shard B as soon as they are spent on Shard A. In other words, Bob will take an enforceable IOU from Alice’s wallet on Shard B, because Bob has a high degree of confidence that Alice has already sent enough ETH to cover it. If enough users of Shard B are willing to observe Shard A and accept standardized IOUs, then Shard A ETH may be spendable on Shard B very quickly after being sent. However, this scheme becomes exceptionally complex when applied to smart contracts, as state is never fungible, and IOUs for state are impossible, so it does not suit general interaction. This means we should regard encumbrances as a UX improvement within loose coupling. It allows loose coupling to simulate tight coupling with fast execution for some transactions.
Divorcing consensus and state
One of the more complex and intellectually-stimulating possibilities is that the consensus process will be divorced from the state update process. Today, Ethereum miners and full nodes accept blocks only after performing all state updates contained in the block. That doesn’t have to be the case. Instead, they could accept blocks but update state later. In this case, rather than achieving consensus on the state of the system as we do in Ethereum, we would reach consensus on the total history (or “total order”) of all transactions across all shards. Doing this means each shard can add blocks quickly without knowing the state of any other shard, which is how sharding generates scaling advantages. However, the transaction’s effect on the state of the shard and the network as a whole will not be known until all shards have finalized. In other words, the finalization of state lags behind the finalization of shard contents.
From a user’s perspective: we would submit transactions immediately, and we know that they are included, but we must wait to be certain of the outcome of that transaction. As shards finalize, we get progressively more information about state, but cannot be entirely sure until all shards have reached finality. Similar to encumbrances, users may in some cases be certain of the outcome of a transaction in advance of the chain and act accordingly.
Conclusion & Engineering Direction
ETH2.0 will be a completely different system from Ethereum. They will both exist in parallel for years and have widely different feature sets. For the near future, expect a one-way peg from ETH to BETH. If you run an exchange or custody service, consider how you can support BETH custodial trading and staking for your users before it is transferable on-chain. For the longer term, consider how your smart contracts will adapt to shards with and without cross-shard communication. Above all, keep tabs on the research and development process. ETH2.0 is a complex and evolving system. All dapp engineers need a clear understanding of ETH2.0 plans and progress.
James Prestwich (@_prestwich) is an engineer and curry enthusiast. He is the founder of Summa, a leading provider of on-chain and cross-chain financial services.
Learn more at summa.one, or contact us at firstname.lastname@example.org.