The Ethereum Beacon Chain as a State Machine

Arguably the most important event in Ethereum history, the Merge, took place on September 15, 2022. It marked the network transition from Proof of Work to Proof of Stake, fundamentally changing how Ethereum reaches consensus. But why is it called ‘the Merge’ and not ‘the transition’?

Before the Merge, Ethereum operated the Proof of Work consensus, a mechanism that required ‘miners’ to solve complex cryptographic puzzles, to validate transactions, and create new blocks. PoW is cool, but it comes with a lot of limitations, such as using immense computational power leading to high electricity usage and environmental concerns, low transaction throughput because of its slower verification time, thus, a challenge for institutional adoption, a higher chance of centralization risk, because it might consolidate to a few powerful mining entities, etc. These and other reasons were the motivations that pushed the Ethereum Foundation, just three years after inception, to start building a new Consensus called Proof of Stake, pegged to solve most of the issues that challenged PoW.

On December 1st 2020, Ethereum launched its first version of PoS, a new chain called the Beacon chain. The Beacon Chain wasn’t processing user transactions. Its sole purpose was to coordinate validators and reach consensus using a new mechanism called Gasper. Transactions were still being processed in the main Proof of Work chain, so both chains, the Ethereum main chain and the Beacon chain, were running in parallel. For nearly two years, these two chains operated independently. Then, on September 15, 2022, the original chain dropped its mining-based consensus and plugged directly into the Beacon Chain. Thus, two chains became one. That is why it’s called the Merge, not the transition.

Today, Ethereum operates as a two-layer blockchain. The consensus layer, which used to be the Beacon Chain, handles block proposals, attestations, and finality. The execution layer, the original Ethereum chain, handles transaction processing. You might have heard these referred to as Eth2 and Eth1, respectively, but the Ethereum Foundation deprecated that naming because it implied two separate networks rather than two layers of one system. This article focuses on the consensus layer. Specifically, how the Beacon Chain operates as a state machine.

What is the BeaconState?

To understand how the state machine transitions, we first need to understand what the state actually contains at the genesis of the chain. The Beacon Chain’s state is represented by a single object called the BeaconState. It holds everything the consensus layer needs to function. It is sometimes referred to as the “God object”. The spec itself groups the fields by purpose, which is the clearest way to walk through them.

class BeaconState(Container):
    genesis_time: uint64
    genesis_validators_root: Root
    slot: Slot
    fork: Fork
    latest_block_header: BeaconBlockHeader
    block_roots: Vector[Root, SLOTS_PER_HISTORICAL_ROOT]
    state_roots: Vector[Root, SLOTS_PER_HISTORICAL_ROOT]
    historical_roots: List[Root, HISTORICAL_ROOTS_LIMIT]
    eth1_data: Eth1Data
    eth1_data_votes: List[Eth1Data, EPOCHS_PER_ETH1_VOTING_PERIOD * SLOTS_PER_EPOCH]
    eth1_deposit_index: uint64
    validators: List[Validator, VALIDATOR_REGISTRY_LIMIT]
    balances: List[Gwei, VALIDATOR_REGISTRY_LIMIT]
    randao_mixes: Vector[Bytes32, EPOCHS_PER_HISTORICAL_VECTOR]
    slashings: Vector[Gwei, EPOCHS_PER_SLASHINGS_VECTOR]
    previous_epoch_attestations: List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH]
    current_epoch_attestations: List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH]
    justification_bits: Bitvector[JUSTIFICATION_BITS_LENGTH]
    previous_justified_checkpoint: Checkpoint
    current_justified_checkpoint: Checkpoint
    finalized_checkpoint: Checkpoint

Versioning:

genesis_time: uint64
genesis_validators_root: Root
slot: Slot
fork: Fork

The first four variables answer the question “what chain are we, and where are we on that chain?”. These fields anchor the chain’s identity and tell every node which protocol rules to follow. Genesis time is a Unix timestamp that is set at the beginning of the chain, and it never changes. The Beacon chain genesis timestamp is 1606824023, which is exactly December 1, 2020, at 12:00:23 PM UTC. If you’ve ever queried ‘block.timestamp’ from a smart contract, that value is calculated from this field.

The Genesis Validator root, just like the timestamp, was also added at the beginning of the chain. It basically just acts as the domain separator; it mixes with the validator’s signature during block proposals and attestations to differentiate the Ethereum mainnet from any other chain.

A Slot is simply a counter that tells us where the chain is in time. It increments every 12 seconds, whether or not a block is produced. While Fork is an object that contains three fields, they are the previous chain version, the current chain version, and the epoch. When the first upgrade on the beacon chain took place on October 27, 2021, the versions switched from Phase 0 to Altair. The current version, as of the time of writing this article, is Fulu, and the previous version is Electra. Just like the validator root, the version hash is added to signatures to differentiate one fork version from another. Epoch, on the other hand, is a bundle of 32 slots, that is 12 sec x 32, roughly about 6.4 minutes. This is where finality checks, slashing penalties, exit queue, and every other cool consensus-specific stuff take place. This is where Casper FFG, the final part in Gasper, operates.

History

latest_block_header: BeaconBlockHeader
block_roots: Vector[Root, SLOTS_PER_HISTORICAL_ROOT]
state_roots: Vector[Root, SLOTS_PER_HISTORICAL_ROOT]
historical_roots: List[Root, HISTORICAL_ROOTS_LIMIT]

This section answers the question, “What has happened on this chain?”. These variables give the chain a compact memory of its own past, allowing validators to reference and verify previous states without storing everything.

The latest block header stores the header of the most recently processed block. It is used to prevent duplicate blocks because, before processing a new block, the chain checks that the block’s parent root matches the root of the latest block header.

Both block roots and state roots fields are lists that store past block roots and state roots, respectively till they are full. In every slot, the roots are written to their respective arrays at the index slot%8192. This allows the chain to look up what the state looked like at any recent slot within the 27-hour window. The Historical root appends the merged hash of the array of block roots and state roots when they are filled up. The list is unbounded, but it grows slowly, with only one entry every 27 hours.

Eth1

eth1_data: Eth1Data
eth1_data_votes: List[Eth1Data, EPOCHS_PER_ETH1_VOTING_PERIOD * SLOTS_PER_EPOCH]
eth1_deposit_index: uint64

Before the merge, Eth2(Beacon chain) needed to track what was happening on Eth1(PoW chain), specifically deposit transactions where new validators locked up 32 ETH. You might be wondering why the 32 ETH that was meant for the PoS was locked in the PoW chain instead of the Beacon chain. The answer is simply because the Beacon chain itself had no token transfer or transaction processing ability, as it couldn’t handle token deposit natively.

Eth1 data contains three sub fields, they are, the deposit root, which is the merkle root of the deposit contract deposit tree, the deposit count, total number of deposits made to the contract, and the block hash, the hash of the eth1 block being referenced.

The Beacon Chain can’t just trust one validator’s view of the Eth1 chain because different validators might see different states due to network delays, so it uses a voting mechanism that allows block proposer includes their view of the current Eth1 data in their block. Those votes accumulate in this list over a voting period, and if any value gets more than half the votes during that period, it becomes the new Eth1 data. At the end of the voting period, the list is cleared and voting starts fresh.

Eth Deposit Index tracks how many deposits from the deposit contract have been processed so far. When the chain processes a new block, it checks if there are unprocessed deposits by comparing this index against the deposit count field in Eth1 data. If the deposit count is higher, the block must include the next deposits up to the max deposit per block, which is 16 as at then.

Registry

validators: List[Validator, VALIDATOR_REGISTRY_LIMIT]
 balances: List[Gwei, VALIDATOR_REGISTRY_LIMIT]

This variable basically stores a list of who is participating in the consensus, and how much stake they have. One cool fact about the validators field is that it only grows and never shrink, even after a validator withdraws, their entry still stays on the list. Currently, there are 2,210,484 entries, and only 962,941 are currently active.

The Validator field has eight sub field, they are, pubKey, which is basically the validator’s public key, withdrawable credentials, where their stake goes to when they withdraw, effective balance, their balance rounded down to the nearest gwei used for calculating rewards and penalties, and it only updated at epoch boundaries with hysteresis to prevent it from flickering up and down. slashed, a boolean flag used to denote if a validator is slashed. activation eligibility epoch, the epoch number when the validator became eligible to be activated. activation epoch, the epoch when they became activated. exit epoch, the epoch when they left, and finally withdrawable epoch, the epoch when their balance can be withdrawn.

To point out why we have an effective balance field in the validator list, and a balances field directly in the beacon state is that, effective balance field does not update the moment your actual balances does, there is a buffer to prevent it from going back and forth. Without hysteresis, a validator hovering around 32 ETH say fluctuating between 31.99 and 32.01 every epoch due to rewards and penalties would have their effective balance flipping between 31 and 32 every epoch. That would mean re-Merkleizing the validator object constantly and changing their weight in committee calculations in every epoch.

Randomness

randao_mixes: Vector[Bytes32, EPOCHS_PER_HISTORICAL_VECTOR]

randao_mixes is a fixed size list of 65,536(about 2 exp 16) entries. Every time a validator proposes a block, they add what we call a ‘randao reveal’ to the list. This reveal is basically the current epoch number signed by the validator. After signing, the chain takes it, and XOR it with the last mix for the current epoch, which produces a new mix. All the proposers in an epoch does same thing to get the final accumulated mix for the next epoch.

The randao mix is used to determine the committee and block proposers for the next epoch. The committee, which is all the active validator divided into the 32 slots is determined by the ‘swap-or-not’ shuffle algorithm. This algorithm basically just swap the validator index randomly with the mix. For the block proposer selection, the chain hashes the randao mix to form a seed. Then it iterates through all the active validator starting at a random offset derived from that seed. For each candidate, it checks if a hash of the seed and the validator’s index, divided by the validator’s effective balance, passes a threshold. if it does, the validator becomes the proposer, if it doesn’t, it skips to the next. In practice it finds one quickly since most active validators have 32 ETH balance.

Slashings

slashings: Vector[Gwei, EPOCHS_PER_SLASHINGS_VECTOR]

A validator is slashed for two reasons, they are, proposing two different block for the same slot trying to create a fork, or, making contradictory attestations. The slashing field is a fixed list of 8192 entries, one per epoch. It contains the sum of all the validator effective balance that was slashed. This field is used to calculate the penalty amount.

Attestations

previous_epoch_attestations: List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH]
current_epoch_attestations: List[PendingAttestation, MAX_ATTESTATIONS * SLOTS_PER_EPOCH]

Every active validator attests once per epoch, and these attestations are what drive both the fork choice rule (LMD-GHOST) and the finality mechanism (Casper FFG).

An attestation contains six sub fields, they are, the slot the validator is attesting for, the beacon block root is the block the validator considers to be the head of the chain, it is considered as an LMD-GHOST vote by the validato used to determine the fork choice. The source is the epoch checkpoint the validator believes to be justified, and target is the current epoch the validator is attesting for, both combined forms the Casper FFG vote that is used in finality. Simply put, the validator is attesting that the source epoch should be finalized, and the target epoch should be justified. The aggregation bits indicates which validator in the committee has attested. Since it is cheaper to combine stuff as bits in a byte, the aggregation bit is stored in a bit field for memory efficiency. Finally, we have the signature of the validator over the attestation data.

Finality

justification_bits: Bitvector[JUSTIFICATION_BITS_LENGTH]
previous_justified_checkpoint: Checkpoint
current_justified_checkpoint: Checkpoint
finalized_checkpoint: Checkpoint

Finality means a block and all its transactions can never be reversed. In the Beacon Chain, finality is absolute. Once an epoch is finalized, the only way to revert it is if 1/3 of all staked ETH gets slashed, which would cost billions of dollars.

Justification bits is a bit vector of length 4, it basically just tracks if the last four epochs were justified. The previously justified checkpoint is the checkpoint that was justified as of the previous epoch. Current Justified Checkpoint is the most recently justified checkpoint, while Finalized Checkpoint is the most recently finalized checkpoint.

An epoch becomes justified when 2/3 of the total active stake attests to a supermajority link pointing to that epoch as the target. Finalization happens when you get two justified checkpoints in a row. The moment the second one gets justified, the first one gets upgraded to finalized, although there is some more nuances to it.

The State Machine

Now that we know what the state contains, we can look at how it changes. Every 12 seconds, a new slot arrives. If a block is proposed, the state transition function takes the current state and that block, runs validation, updates, and outputs a new state. This process is divided into three stages by the spec: slot processing, block processing, and epoch processing.

Slot Processing

Slot Processing runs every time the chain needs to advance from one slot to the next, whether or not a block was produced.

Three things happen when a slot advances from N to N+1. First, the state root for slot N is updated to preserve a record of what the slot looked like at that slot. Secondly, the latest block header that stores the header of the most recently processed block get updated as well, remember this field is used to prevent duplicate blocks. Also the root of that completed header gets written into the block root. Lastly, the slot counter is incremented by one.

You might be wondering what happens to the state when a proposer misses their slot. Well, all state still get updated, the block roots of that slot for example, will contain the root of the same latest block header since no new block arrived to replace it.

After incrementing the slot, the chain checks if it just crossed an epoch boundary, can be easily done with slot mod 32 == 0, If so, epoch processing kicks in before anything else happens. So technically, for every 32 slots, epoch processing runs alongside the slot processing, in other words, epoch processing runs after the slot advances but before the block for that slot is processed.

One last fact to note, when we are at the point where state roots and block roots array is filled up, that is, at the slot mod 8192 == 0 position, before both the array start to get overwritten by new data since it seems circular, the chain hashes the two fields together, and append it to the historical state.

Block Processing

Block processing runs when a block is actually proposed for a slot. After slot processing advances the state to the correct slot, block processing takes the signed block and applies its contents to the state. It has two major parts, they are, validating the block header and processing the block body.

Before anything else, the chain checks some few things like if the block header matches the current state slot, and if the block proposer index is actually the validator that the randao selected. Lastly, it checks if the block parent root matches the root of the latest block header, if validated, the block header get stored as the latest block header in the state.

Next, the proposer must include a randao reveal, which like I said earlier, is basically the current epoch number signed by the validator. The chain verifies the signature against the proposer public key. It is easy to tell that this current method is deterministic, that the validator signature will always remain the same for the same epoch number, true, in fact the whole point of doing it this way, is to allow anyone to use the public key of the validator to check the epoch number that the validator actually signed. Note that a proposer cannot skip the randao reveal, if they propose a block, they must include it, the only way of skipping the process, is by not proposing a block in the first place.

After that, the proposer will also include their view of the Eth1 chain deposit contract state as an Eth1 data vote. If any Eth1 data value in the votes list reaches a majority, it becomes the new Eth1 data in the state.

During block processing, the proposer slashing field contains evidence that a validator signs two different block headers for the same slot, the chain verifies both signature, and if valid, it slashes them by first setting their slashed flag to true, and adding their effective balance to the slashings array. Also, for the attesters, if there’s any conflicting attestation, the chain verifies it, and identifies the culprit validators through their signature, and then slashes those validator, same way and process like the block proposer slashing.

New attestations in the block get validated, the attestation is converted into a pending attestation by adding two new fields, which are, the inclusion delay and the proposer index, and then, they are appended to either the current epoch attestation or the previous epoch attestation. Nothing much after they are added as they don’t affect the state immediately, they sit there until epoch processing evaluates them.

New validator’s deposit from the Eth1 deposit contract get processed, the block must include all pending deposit up to the max deposit of 16. The chain verifies each deposit merkle’s proof against the deposit root to ensure it’s correct. If the depositor public keys is new, a new validator entry is added, as well as the corresponding balance.

A validator can signal that they want to leave by submitting a signed voluntary exit. The chain checks that the validator have been active for at least 256 epoch, and that the current epoch is at least the exit epoch that is specified. If everything checks out, the validator’s exit epoch and withdrawable epoch get set.

After everything above get processed, there is one last but important piece, which is, the computed state root by the other node is compared with the state root that is included in the block. If they don’t match, the entire block is rejected.

Epoch Processing

Epoch processing get triggered at the epoch boundary that is every ‘slot mod 32 = 0’ step. It runs during slot processing when the slot counter crosses into a new epoch. it is the most complex stage as most of the consensus cool stuff happens here.

First, justification and finalization kicks in, this is where Casper FFG does its work, the process might sound complex but it is very easy, put simply, the chain looks at the attestations from the previous epoch and counts the effective balances of validators who attested with the correct target. If that sum is at least 2/3 of the total active balance, the target epoch becomes justified, and If two consecutive epochs are justified, the earlier one becomes finalized. Easy!

One fact that i didn’t point out in the block processing stage, is that for each stage, the validators accumulate rewards, they get rewarded for including attestation or for adding slashing evidence, or even normal base reward. These accumulated rewards get evaluated by the chain over the previous epoch in the epoch processing stage, and adjust their balance accordingly.

If the chain hasn’t finalized in more than four epochs, what is described as “the inactivity leak” kicks in. On top of the normal penalties for missing attestations, non-participating validators get hit with an additional inactivity penalty that grows quadratically with each epoch since the last finalized epoch. In other words, the longer it takes a block to finalize, the harsher your penalty. You might be wondering what is the point of doing this. Now, when a block hasn’t been finalized for a while, it means there is not a majority vote on that block, and since voting for finality is measured by the weight of a validator effective balance, basically how many ETH does the validator have, a reduction in their effective balance amount reduces their voting power. Thus, Ethereum uses it as a way to force block finality. Worthy to point out that during an inactivity leak, even correct attesters don’t receive rewards. Everything shifts to pure penalty mode to speed up the rebalancing.

Another important event that happens at this stage is the activation of validators, whose activation eligibility epoch has been reached. Also, validators whose exit epoch has arrived, are moved out of the active set of validators.

Next, remember that the effective balance doesn’t update with every slot and epoch, because hysteresis applies. It only updates if the actual balance has risen sufficiently above the current effective balance upward threshold, the effective balance increases by 1 ETH, and if it drops below the downward threshold, it decreases by 1 ETH.

Lastly, current epoch’s randao mix gets copied into the next epoch’s slot. As you can see clearly, epoch processing is the most complicated part of the state machine, and it is computational unfriendly, thus, it always lead to a lot of optimization work for the engineers that are building the chain clients.

From Phase 0 to Fulu

The BeaconState we’ve walked through so far is the Phase 0 version, the original. But the Beacon Chain is a living system. Every consensus layer fork has modified the BeaconState, either by adding new fields, removing old ones, or just changing how some existing ones operate.

For example, in Altair fork, previous and current epoch attestation list were removed entirely to be replaced by previous and current epoch participation, the difference being that, while the previous one stores full attestation objects, the new participation type just stores it in bit form instead. Now, each validator just gets three bits per epoch representing whether they got the source right, the target right, and the head right. This led to a drastic reduction in the memory size. Bellatrix, the merge fork, introduced the execution payload field to the beaconState, this field is used to connect the consensus layer and the execution layer. The Eth1 fields were kept but their roles were diminished as they were no longer needed.

Capella fork introduced withdrawal abilities for the validator, it records it by adding a new field called the next withdrawal index to the beaconState. Historical roots was replaced by historical summaries that stores block root and state root in a struct instead of hashing them together. Deneb fork, on the other hand, had almost zero impact on the beaconState, because the fork was mainly concerned with blobs. In Electra and Fulu, the key change was raising the maximum effective balance from 32 ETH to 2048 ETH, this led to an introduction of new fields in the beaconState.

Each fork built on the last, and the BeaconState grew from 21 fields in Phase 0 to over 30 in Fulu. One thing is consistent, From Phase 0 to Fulu, the state has grown, forks add fields, fork remove fields, but the architecture has stayed the same.

Conclusion

The Beacon Chain is often described as complex, yes I do agree, because it is complex. But at its core, it is basically a cycle of, a state exists, an input arrives, existing state get updated to produce a new state, then repeat! Simple! What makes the beacon chain truly remarkable is not any single field, but it is how all of it connect to give us this awesome ‘tek’, we have today!

References

Ethereum Consensus Specifications (Phase 0)

Ethereum Annotated Specification by Ben Edgington eth2book.info

Gasper: Combining GHOST and Casper (Original Paper)

Casper the Friendly Finality Gadget (Original Paper)

Lighthouse Client Implementation

Ethereum Beacon Chain Explorer

Ethereum Foundation Blog on the Merge

Ethereum Annotated Spec by Vitalik Buterin