paint-brush
Understanding the Ethereum Swarm Storage Scaling Mechanismby@thebojda

Understanding the Ethereum Swarm Storage Scaling Mechanism

by Laszlo FazekasSeptember 9th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

"Swarm is like a living organism that "eats" data, and once it has consumed a certain amount of data, it divides." This article is about the scaling mechanism of Ethereum Swarm, and how it optimizes storage usage through the incentive system and the protocol.
featured image - Understanding the Ethereum Swarm Storage Scaling Mechanism
Laszlo Fazekas HackerNoon profile picture


With the Ethereum blockchain as the CPU of the world computer, Swarm is best thought of as its "hard disk". - Viktor Trón, Architect of the SWARM project


This "hard disk" comprises networked nodes that share a portion of their storage capacity. In the case of a physical hard disk, it's relatively simple to find space for the data. We simply look for free sectors and write the data. In contrast, Swarm is a decentralized system that is also fully dynamic. Nodes drop out, and new nodes join. There is no central system to dictate where each piece of data should be stored; this is a decision the nodes must make, guided by the protocol and the incentive system. This is an extremely complex problem, and Ethereum Swarm provides a unique and elegant solution. However, to understand how the system works, we must first become familiar with the concept of a "neighborhood”.


Imagine a minimal Swarm network consisting of 4 nodes. Each node provides 20GB of storage capacity to the network. Initially, the virtual hard disk is completely empty.


Swarm splits all data into 4KB chunks and calculates the hash for each chunk. This 256-bit hash is the unique identifier for the chunk. (There are other types of chunks in Swarm as well, but for simplicity, we will not cover them now.)


Similarly, each node has its own address, composed of an Ethereum public key, a nonce, and a network identifier. This is called the overlay address, and it is also 256 bits.


When we start filling this network with data, each node stores every chunk until we reach the 20GB storage capacity. At that point, the storage must be optimized to accommodate new data. Swarm's solution is that, in such cases, nodes partition the address space among themselves. In practice, this means that some nodes retain chunks whose first bit is 1, while other nodes retain chunks where the first bit is 0. Which node keeps chunks starting with 0 bit and which keeps chunks starting with 1 bit is determined by the node's own address. A node whose address starts with a 1 bit will keep chunks starting with a 1 bit, and a node whose address starts with a 0 bit will keep chunks starting with a 0 bit.


In Swarm, the distance between two addresses is smaller the more bits match at the beginning of the two addresses. So, we can also say that each node stores the chunks that are "closer" to it.


Ideally, if the distribution of chunks is uniform, the nodes can free up about half of their storage space this way. Similarly, if the distribution of the nodes' addresses is also ideal—that is, if the first bit of 2 nodes' addresses is 1 and the first bit of the other 2 nodes' addresses is 0—then our network will look like this: 2 nodes store the chunks starting with a bit of 1, and 2 nodes store the chunks starting with a bit of 0. Each partition stores approximately 10GB of chunks.


It may happen that the first bit of all 4 nodes' addresses is 1. In this case, no one would store the chunks starting with 0 bits. In such a situation, a few nodes can randomly decide to generate a new address for themselves, where the first bit is 0, thus moving to the other partition to ensure that chunks starting with 0 are also stored.


If the storage in a partition becomes full, the address space can be repartitioned. At this point, the first 2 bits of the chunk will determine which node stores the chunk. So, if the 2 nodes in the 0 partition become full, they will repartition the address space, creating a 01 partition and a 00 partition. This frees up new space, allowing additional data to be stored.


Of course, 4 nodes can be divided into a maximum of 4 partitions, but in the case of many nodes (Swarm currently operates with more than 10,000 nodes), the partitioning can obviously go much deeper.


Swarm is like a living organism that "eats" data, and once it has consumed a certain amount of data, it divides.


Swarm calls these partitions of the address space, where the nodes are grouped, "neighborhoods." To ensure redundancy, the incentive system is tuned so that, ideally, there are 4 nodes in a neighborhood. This provides sufficient fault tolerance for the system in case 1 or 2 nodes drop out. (There is also the possibility of providing better fault tolerance using erasure coding.)


If there are more than 4 nodes in a neighborhood, it is no longer profitable to stay there, so some nodes will choose a new address to move to a neighborhood with fewer nodes, where they will receive more BZZ tokens for storage. The equilibrium is achieved when there are 4 nodes in a neighborhood. In the long run, nodes will evenly distribute the chunks to be stored, and each neighborhood will have 4 nodes.


This "cell-division-like" self-organizing solution is how Swarm ensures the optimal use of distributed storage with 4x redundancy.