Ethereum processes vast amounts of data every second. To keep the network efficient and accessible, this data must be organized in standardized and memory-optimized formats. To achieve this, Ethereum uses specialized data structures that allow anyone to run a node, even on modest consumer-grade hardware.
This blog explores the core data structures that power Ethereum’s execution and consensus layers: Patricia Merkle Tries, which provide cryptographically secure storage of key-value pairs, Recursive Length Prefix (RLP), a compact and consistent data serialization method, and Simple Serialize (SSZ), the consensus layer’s preferred serialization format designed to work seamlessly with merklelization.
Understanding these foundations is key to seeing how Ethereum achieves security, scalability, and decentralization.
In this blog, we will focus specifically on Merkle Patricia Tries.
Merkle Patricia Trie
Ethereum works like a state machine. Every transaction updates the global state, which contains account balances, contract storage, and code. This state is stored as key–value pairs, and Ethereum needs a data structure that is both efficient and secure. For this, it uses the Modified Merkle Patricia Trie (MPT).
The MPT combines two ideas: tries and Merkle trees. A trie is a tree-like structure for looking up keys by following a path of characters (or, in Ethereum’s case, hexadecimal nibbles). A Merkle tree is a hash-based structure that allows anyone to verify data without needing the whole dataset.
By combining them, Ethereum gets a database that is efficient for lookups and updates, while also cryptographically secure against tampering.
The key strength of the MPT is that it produces a single root hash that represents the entire state of Ethereum. If even one balance changes, this root hash changes. This means two nodes can quickly agree on whether they have the same state by just comparing root hashes. It also enables Merkle proofs, where someone can prove a piece of data exists in the state by only providing the path and hashes, rather than the whole database.
Radix Tries (The Foundation)
The foundation of MPT is a radix trie. In a radix trie, each key is split into symbols (like hex digits), and we walk down the tree one symbol at a time. Each node can have multiple children, and the value is stored at the end of the path.
For example, suppose the key "dog"
is stored. First, we convert it into hex (64 6f 67
). Then we start at the root and follow the path nibble by nibble: 6 → 4 → 6 → 15 → 6 → 7
. At the end of this path, we find the value associated with "dog"
.
Updating works similarly: you walk down the path, copy the nodes that change, update the value, and then recompute the hashes on the way back up. Deleting means following the path, removing the value, and cleaning up any nodes that become unnecessary.
While this works, it can be very inefficient. Ethereum addresses are 32 bytes, which is 64 hex characters. A plain radix trie would require 64 steps just to look up one key, with many empty branches along the way. This is where the Patricia optimization comes in.
Adding Hashing: Merkle Radix Tries
Ethereum doesn’t just store values in the trie—it also hashes them. Every node is stored in the database under a key equal to the hash of its contents, specifically keccak256(rlp(value))
. This makes the structure content-addressable: the same node always has the same hash, and if the content changes, the hash changes.
This hashing turns the trie into a Merkle Radix Tree, where the root hash acts as a fingerprint of the entire structure. If you know the root, you can verify that any piece of data belongs in the trie by walking down the path and checking the hashes. If someone tries to fake data, the hashes won’t line up.
Think of it like a library catalog: the root hash is like a digital seal on the entire catalog. If even one entry is wrong, the seal won’t match.
The Patricia Optimization
To fix the inefficiency of radix tries, Ethereum adds Patricia optimizations. Instead of storing one nibble per node, paths are compressed. There are three types of nodes in MPT:
- Branch nodes, which have 16 child slots (one for each hex nibble) plus a value slot.
- Leaf nodes, which store the end of a path and the value.
- Extension nodes, which store a sequence of nibbles leading to the next node.
This way, instead of having 64 steps for a 64-nibble path, we might only have a handful of steps, since long single-child paths are collapsed into extension nodes.
Compact Encoding of Paths
When Ethereum stores paths in the Merkle Patricia Trie, it does not save them nibble by nibble (too large). Instead, it uses a compact encoding scheme that packs extra information into the first nibble.
This first nibble tells us two things:
-
Node type: Is it a leaf node or an extension node?
-
Path length: Is the remaining path odd or even in length?
The rules are:
Hex |
Bits |
Node type |
Path length |
---|---|---|---|
0 |
0000 |
Extension |
Even |
1 |
0001 |
Extension |
Odd |
2 |
0010 |
Leaf (end) |
Even |
3 |
0011 |
Leaf (end) |
Odd |
For even remaining path length (0
or 2
), another 0
"padding" nibble will always follow.
def compact_encode(hexarray):
term = 1 if hexarray[-1] == 16 else 0
if term:
hexarray = hexarray[:-1]
oddlen = len(hexarray) % 2
flags = 2 * term + oddlen
if oddlen:
hexarray = [flags] + hexarray
else:
hexarray = [flags] + [0] + hexarray
# hexarray now has an even length whose first nibble is the flags.
o = ""
for i in range(0, len(hexarray), 2):
o += chr(16 * hexarray[i] + hexarray[i + 1])
return o
Example:
- Input:
> hexarray = [0xf, 0x1, 0xc, 0xb, 0x8, 0x10] or [f, 1, c, b, 8, 10]
'3f 1c b8'
Notice the last nibble is 0x10
(decimal 16).
- That’s theterminator marker (leaf) in Ethereum’s compact encoding.
Walk through code:
term = 1 if hexarray[-1] == 16 else 0
Last element = 16 → so term = 1.
if term:
hexarray = hexarray[:-1]
Strip the terminator →
hexarray = [f, 1, c, b, 8] (5 nibbles).
oddlen = len(hexarray) % 2
Length = 5 → oddlen = 1
flags = 2 * term + oddlen
2 * 1 + 1 = 3 → so flags = 3.
if oddlen:
hexarray = [flags] + hexarray
Since oddlen = 1, prepend 3 →
hexarray = [3, f, 1, c, b, 8].
Now, Pack Into Bytes
The loop does:
o += chr(16 * hexarray[i] + hexarray[i + 1])
Pairs:
(3, f) → 0x3f
(1, c) → 0x1c
(b, 8) → 0xb8
Output:
'3f 1c b8'
-
Input:
> hexarray = [0x1, 0x2, 0x3, 0x4, 0x5, ...] or [1, 2, 3, 4, 5, ...] '11 23 45'
Walk through code:
term = 1 if hexarray[-1] == 16 else 0
Last element = 5 → so term = 0.
if term:
hexarray = hexarray[:-1]
will not execute as term = 0
oddlen = len(hexarray) % 2
Length = 5 → oddlen = 1
flags = 2 * term + oddlen
2 * 0 + 1 = 1 → so flags = 1.
if oddlen:
hexarray = [flags] + hexarray
Since oddlen = 1, prepend 1 →
hexarray = [1, 1, 2, 3, 4, 5].
Now, Pack Into Bytes
The loop does:
o += chr(16 * hexarray[i] + hexarray[i + 1])
Pairs:
(1, 1) → 0x11
(2, 3) → 0x23
(4, 5) → 0x45
Output:
'11 23 45'
Here is the extended code for getting a node in Merkle Patricia Trie.
def get_helper(node_hash, path):
if path == []:
return node_hash
if node_hash == "":
return ""
curnode = rlp.decode(node_hash if len(node_hash) < 32 else db.get(node_hash))
if len(curnode) == 2:
(k2, v2) = curnode
k2 = compact_decode(k2)
if k2 == path[: len(k2)]:
return get(v2, path[len(k2) :])
else:
return ""
elif len(curnode) == 17:
return get_helper(curnode[path[0]], path[1:])
def get(node_hash, path):
path2 = []
for i in range(len(path)):
path2.push(int(ord(path[i]) / 16))
path2.push(ord(path[i]) % 16)
path2.push(16)
return get_helper(node_hash, path2)
Example Trie
Let’s walk through an example of how a Merkle Patricia Trie (MPT) is built from key-value pairs.
We’ll use five accounts with these key-value pairs:
Account |
Key (address/hash, shortened) |
Value (simplified state) |
---|---|---|
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
5 |
|
|
In Ethereum, the keys are normally 32-byte Keccak-256 hashes of addresses. Here, we’re using shorter keys so the example is easy to follow.
Step 1: Insert the first key-value pair.
Start with an empty MPT, and add the first key-value pair: (0x616b6c64, 0x01)
. Since it is just one key, it results in a trie of only one leaf node. To create that node, proceed in the following way:
-
Write a two-item array: The first element should be the whole key and second one the value. Add the prefix flag
20
to the first element, indicating that the node is a leaf and that the key has an even amount of nibbles. Then, the array should look like this:["0x20616b6c64","0x01"]
. -
Encode in RLP the array: You can use an
RLP converter or this Python script:import rlp key = bytes.fromhex("20616b6c64") value = bytes.fromhex("01") encoded = rlp.encode([key, value]) print(encoded.hex())
This should output this hex: c78520616b6c6401
.
-
Hash the RLP encoding using Keccak-256 to get the pointer of this node. You can use an
online hasher or this script:from eth_utils import keccak rlp_bytes = bytes.fromhex("c78520616b6c6401") hash_pointer = keccak(rlp_bytes) print(hash_pointer.hex())
This should output:
4e2d0fbe6726eac15c5ecf49a4e1f947aa50e0531f4f3e98b8e1577ba52e1783
The resulting MPT should look like this:
[Key, Value] -> [“0x616b6c64”, “0x01”]
RLP Encoding -> 0xc78520616b6c6401
Hash of RLP Encoding -> 4e2d0fbe6726eac15c5ecf49a4e1f947aa50e0531f4f3e98b8e1577ba52e1783
Step 2: Insert the second key-value pair.
[“0x616b6c65”, “0x02”]
. Since this key shares the first 7 digits with the previous one, we proceed in the following way:
-
Build one leaf for each key: In each leaf, the array's first element should be the remaining path, but since the two keys share every digit except the last one, the remaining path is empty. So, we should just write there the flag
20
, indicating that we are in a leaf node and that the path has an even amount of digits (zero digits). After that, encode the arrays in RLP and hash the RLP encodings, as we did in step 1. -
Build a branch node: Create a 17-item array. Write the hash of the first key's leaf node at index 4 and the hash of the second key's leaf node at index 5. Encode the array in RLP and hash the encoding.
-
Build the extension and root node: Create a two-item array that contains the shared prefix as the first element and the hash of the previously built branch node as the second element. Since the shared prefix has an odd number of digits, add the flag
1
to it. Encode the array in RLP and hash the encoding.[Key, Value] -> ["0x20", "0x01"] RLP Encoding: 0xc22001 Hash of RLP Encoding: 0xce8c4b060e961e285a1c2d6af956fae96986f946102f23b71506524eea9e2450 [Key, Value] -> ["0x20", "0x02"] RLP Encoding: 0xc22002 Hash of RLP Encoding: 0x1e116569fc77f5d4721ac60cbbeee7bdf40344511d9c4b1ac6ac9e6a8ae3c4e7 [“0x”, "0x", "0x", "0x", "0xce8c4b060e961e285a1c2d6af956fae96986f946102f23b71506524eea9e2450", "0x1e116569fc77f5d4721ac60cbbeee7bdf40344511d9c4b1ac6ac9e6a8ae3c4e7", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x"] RLP Encoding: 0xf85180808080a0ce8c4b060e961e285a1c2d6af956fae96986f946102f23b71506524eea9e2450a01e116569fc77f5d4721ac60cbbeee7bdf40344511d9c4b1ac6ac9e6a8ae3c4e78080808080808080808080 Hash of RLP Encoding: 0xc64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b [Key, Value] -> ["0x1616b6c6", "0xc64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b"] RLP Endcoding: 0xe6841616b6c6a0c64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b Hash of RLP Encoding: 0x870b83eb1f33d66c00bf9b0e1b3fce53ef7b8e9999d7f94df24214ae49238d2e
Step 3: Insert the third key-value pair.
[“0x616b6c78”, “0x03”]
, Since this key shares the first 6 digits with the previous two keys, we proceed in the following way:
-
Add a leaf node: Notice that in this case, since the new key shares with the previous ones all the digits except the last two, the array's first item will have just one digit as the remaining path, and the flag
3
indicating that it is a leaf node with an odd amount of path digits. -
Add a branch node: Its array should contain at index 6 the hash pointer of the branch node built in step 2, and at index 7 the hash pointer of the new leaf node we recently added.
-
Add an extension node: The root will be another extension node. Its array should contain the shared prefix as the first element and the hash of the recently added branch node as the second element. Since the share prefix has an even number of digits, add the flag
00
to it.[Key, Value] -> ["0x38", "0x03"] RLP Encoding: 0xc23803 Hash of RLP Encoding: 0xcbfcfed99ffa2123bea4569cf9b9201b1f87e4649acfbdb4ad36aa1f5317c33a ["0x", "0x", "0x", "0x", "0x", "0x", "0xc64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2b", "0xcbfcfed99ffa2123bea4569cf9b9201b1f87e4649acfbdb4ad36aa1f5317c33a", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x", "0x"] RLP Encoding: 0xf851808080808080a0c64b8c068233f978ac4e505b00ab3186aa3deeb29e76c928cd372161b899bc2ba0cbfcfed99ffa2123bea4569cf9b9201b1f87e4649acfbdb4ad36aa1f5317c33a808080808080808080 Hash of RLP Encoding: 0xcc97f12ea3217345e666974cd81b117ca02404f19c15d31158ac1d1e55398706 [Key, Value] -> ["0x00616b6c", "0xcc97f12ea3217345e666974cd81b117ca02404f19c15d31158ac1d1e55398706"] RLP Endcoding: 0xe68400616b6ca0cc97f12ea3217345e666974cd81b117ca02404f19c15d31158ac1d1e55398706 Hash of RLP Encoding: 0x85fb3f4e331ecdabcf4d198863553663a6191de909e8e4a8dad0dd8d50debb97
-
Add the last two key-value pairs, continuing in this way, following the same steps as we did for the previous keys. When you're done, you should have the following MPT:
Verify
When a verifier gets the StateRoot
and the proof path for a key, they must check if the proof is correct. Unlike a normal Merkle Tree, where the check starts at the leaf and goes up to the root, in a Merkle Patricia Trie, the process goes from the top down.
The verifier begins at the StateRoot
and then moves step by step through the nodes in the proof path. At each step, they decode the node and make sure that the hash stored inside matches the next node in the path. This continues until the verifier reaches the final leaf node. If this leaf contains both the remaining part of the key and the correct value, then the proof is valid.
Let's say the verifier receives the proof of the above for the key-value (0x616b6d31, 0x04)
. Then, he has to follow these steps:
-
Hash the first element of the path and check that it matches the given StateRoot. Indeed:
keccak(bytes.fromhex( "f851808080a0a26b2ac124718443aeed68fa0309225d0c8dd9dbee45685909e92fb594e1a4638080a02ccd118c9470c051689543b233ab109ad74d2fb4f57eb429c4d43294d6ae686780808080808080808080" )).hex() == "13ea549e268b5aa80e9752c6be0770cffba34d2b1aa1f858cb90f3f13ac3bed7"
-
Decode the first RLP element of the path and verify that in the index, 6 has the hash of the second path element. Indeed:
rlp.decode(bytes.fromhex( "f851808080a0a26b2ac124718443aeed68fa0309225d0c8dd9dbee45685909e92fb594e1a4638080a02ccd118c9470c051689543b233ab109ad74d2fb4f57eb429c4d43294d6ae686780808080808080808080" ))[6].hex() == "2ccd118c9470c051689543b233ab109ad74d2fb4f57eb429c4d43294d6ae6867"
-
Decode the second RLP element of the path. You will see an array with two items. The first item is
0x1616b6
. Because it starts with1
, this means it is an extension node. Now, check that the rest of the digits match the key we are trying to prove. Then, confirm that the second item in the array is the hash of the next element in the path. -
Decode the third element of the path. This will be a branch node. Look inside it, and make sure that at index
d
, it holds the hash of the following element in the path. Merkle Patricia Trie
-
Decode the last element of the path. This will be an array with two items. The first item is
0x2031
. Since it begins with20
, this means it is a leaf node. Verify that this first item contains the remaining key digits31
and that the second item is the value0x04
.
Tries in Ethereum
In Ethereum’s execution layer, all the merkle tries are based on a data structure called the Merkle Patricia Trie (MPT). From every block header, you can actually see three important roots — each representing a different trie:
- stateRoot – tracks account state.
- transactionsRoot – tracks all transactions in the block.
- receiptsRoot – tracks receipts of those transactions.
Let’s break them down one by one.
State Trie
There is one global state trie, and it gets updated every time a block is processed.
- The path in this trie is always
keccak256(ethereumAddress)
. - The value stored at that path is
rlp(ethereumAccount)
.
An Ethereum account is stored as an array of four items:
[ nonce, balance, storageRoot, codeHash ]
Here’s the interesting part: the storageRoot
itself is the root of another trie — the Storage Trie.
Storage Trie
The storage trie is where all contract data lives. Each account (that is, a contract) has its own storage trie.
If you want to retrieve a contract’s stored value, you need:
- the storage address (contract address),
- the position of the storage slot, and
- the block number (or just
"latest"
for the current state).
For example, to fetch data from slot 0 of address 0xc0ffee254729296a45a3885639AC7E10F9d54979
, you can use the JSON-RPC method eth_getStorageAt
:
curl -X POST --data '{
"jsonrpc":"2.0",
"method":"eth_getStorageAt",
"params": [
"0xc0ffee254729296a45a3885639AC7E10F9d54979",
"0x0",
"latest"
],
"id":1
}' localhost:8545
This might return something like:
{"jsonrpc":"2.0","id":1,"result":"0x00000000000000000000000000000000000000000000000000000000000004d2"}
For other slots, you can’t just pass the number directly. Instead, Ethereum calculates the slot’s position as:
keccak256( padded(address) + padded(slotIndex) )
For example, for slot 1,
at address 0x391694e7e0b0cce554cb130d723a9d27458f9298
, the key is:
keccak256("000...391694e7...f9298" + "000...0001")
In a Geth console, you can calculate it with:
var key = "000000000000000000000000391694e7e0b0cce554cb130d723a9d27458f9298" +
"0000000000000000000000000000000000000000000000000000000000000001"
web3.sha3(key, { "encoding": "hex" })
This gives the hashed position, which you then pass into eth_getStorageAt
to fetch the actual stored value.
If an account is not a contract, its storageRoot
is empty.
Transactions Trie
Every block also has its own transactions trie. This stores all the transactions of that block as (key, value)
pairs.
- The path is
rlp(transactionIndex)
(basically the transaction’s position in the block). - The value depends on the transaction type:
-
For legacy transactions →
rlp(tx)
-
For typed transactions (EIP-2718) →
TxType | encode(tx)
-
This allows Ethereum to support both old and new transaction formats
Receipts Trie
Finally, every block also has a receipts trie. Like the transactions trie, the path is also rlp(transactionIndex)
.
-
The receipts trie is never updated once the block is sealed.
-
A receipt is basically proof of what happened in a transaction (gas used, logs, etc.).
Receipts can be of two kinds:
-
LegacyReceipt → stored as
rlp([status, cumulativeGasUsed, logsBloom, logs])
. -
Receipt (typed) → stored as
TransactionType | ReceiptPayload
.
Again, this supports both old and new transaction formats, thanks to EIP-2718.