When everything works fine, you don't usually worry about blockchain testing. We will explain below why it's better not to shelve performance assessment, what metrics to use and make the most of it. Let's dive in.
In the context of distributed systems, TPS are a very ambiguous and flighty metric.
TPS measurements came from distributed databases. They are usually performed using standardized transaction types or sets (e.g. some number of
INSERT
, UPDATE
, DELETE
along with the constant SELECTs
number) and configured for a particular cluster or a separate machine. Such “synthetic” metrics don’t reflect the real performance of the database or blockchain in question, because transaction processing time in such systems may vary.Consistency-oriented databases (see “CAP-theorem”) do not commit a transaction until they receive a sufficient number of confirmations from other nodes - and this is slow.
Availability-oriented databases consider a transaction successful if it was simply written to a disk. They immediately provide the updated data - and this is very fast (although this transaction may be rolled back in the future).
TPS will be higher if transactions update only one data cell. If transactions update many data cells (rows, indexes, files), they will block each other. That’s why we don’t see any “TPS competitions” between Oracle, MSSQL, PostgreSQL on the one hand, and MongoDB, Redis, Tarantool on the other - their internal mechanisms and tasks differ a lot.
From our point of view, “measuring blockchain TPS” means conducting a full range of performance measurements:
a) in repeatable conditions
b) with a close to reality number of block validators
c) using various types of transactions:
- typical for the studied blockchain (for example, transfer () of the main cryptocurrency)
- loading storage subsystem (considerable changes from each transaction)
- loading network bandwidth (large transaction sizes)
- CPU-loading (massive cryptographic transformations or calculations)
To talk about the cherished “transactions per second”, you need to describe all network conditions, parameters, and benchmarking logic. In blockchains, applying a transaction to some internal database does not mean consensus will accept it.
In PoW consensus, transactions are never finalized at all. If a transaction is included in a block on one machine, this does not mean it will be accepted by the entire network (e.g. if another fork wins).
If a blockchain has an additional algorithm that ensures finality (like EOS, Ethereum 2.0, Polkadot parachains using consensus with GRANDPA finality), then the processing time can be viewed as time when the node “saw” the transaction and the next finalized block. Such “TPS” are very informative but rare since they are lower than expected.
“TPS” involve a lot of things. Be skeptical and ask for details.
Local TPS
The number of processed transactions and max/avg/min processing time (on the local node) are very convenient to measure, since the functions performing these operations are usually expressed in the code. Transaction processing time equals time necessary to update the state database. For example, in “optimistic” blockchains the processed transaction may already be validated yet not be accepted by consensus. In this case, the node sends updated data to the client (assuming there won’t be any chain fork).
This metric is not very honest: if another chain fork is chosen as the main one, then statistics on rolled back transactions must also be rolled back. In testing, this is often neglected.
“Our blockchain got 8,000 tps yesterday”. Such numbers can often be found in brief project reports, since they are easy to measure. Just one running node and a loading script are enough. In this case, there is no network delay that would slow down reaching network consensus.
The metric shows the performance of the state database without network influence. This number doesn’t reflect the real network bandwidth but shows the limit to which it will strive if the consensus and network are fast enough.
The result of any blockchain transaction is several atomic storage writes. For example, a Bitcoin payment transaction involves removing several old UTXOs (delete) and adding of the new ones (insert). In Ethereum, a small smart contract code is executed and several key-value pairs are updated.
Atomic storage writes are an excellent metric to find storage subsystem bottlenecks and differentiate between low-level and internal logic issues.
Blockchain nodes can be implemented in several programming languages - this is more reliable. For example, there are Rust and Go implementations of the Ethereum node. Remember this when testing network performance.
Local produced blocks amount
This simple metric shows the number of blocks produced by a certain validator. It depends on consensus and is essential to assess “usefulness” for a network of individual validators.
Since validators make money on each block, they take care of stable operation and safety of their machines. You can determine which validator candidate is the most qualified, protected and prepared to work in a public network with assets of real users. The metric value can be publicly checked - simply download the blockchain and calculate the number of blocks.
Finality и Last Irreversible Block
Finality ensures that all transactions included in the blockchain up to the finalized block are never rolled back or replaced by another chain fork. It’s a way for PoS networks to protect against double spend attacks and to confirm cryptocurrency transactions for users.
A user considers a transaction final when there’s a block that finalizes the chain containing this transaction, not when a transaction is simply accepted by the node. To finalize a block, validators must receive this block in a p2p network and exchange signatures with each other. The real speed of a blockchain is checked here, as the moment of transaction finalization is the most important for users.
Finality algorithms also differ, intersect, and combine with the main consensus (to read: Casper in Ethereum, Last Irreversible Blocks in EOS, GRANDPA in Parity Polkadot and their modifications, for example, MixBytes RANDPA).
For networks in which not every block is finalized, a useful metric is delay between the latest finalized block and the current latest block. This number shows how much validators are lagging behind, agreeing on the correct chain. If the gap is large, then the finality algorithm requires additional analysis and optimization.
Peer-to-peer subsystem - the intermediate layer of blockchain networks - is often neglected. Blame on it all vague delays in block delivery and transactions between validators.
When the number of validators is small, they are localized, peer lists are hard-coded, and everything works well and quickly. But just as validators are there, nodes are geographically distributed and packetloss is emulated, we are facing significant “tps” failure.
For example, when testing EOS consensus with the additional finality algorithm, increasing the number of validators to 80-100 machines, distributed across four continents, had little effect on finality.
At the same time, increased packetloss strongly affected finality, that proves the need for additional p2p layer configuration for greater resistance to network packetloss (and not to high latency). Unfortunately, there are a lot of different settings and factors, and only benchmarks allow us to understand the required number of validators and get a fairly comfortable blockchain speed.
The p2p subsystem configuration is clear from the documentation, for example, have a look at [libp2p], [Kademlia] protocol or [BitTorrent].
Important p2p metrics can be:
For example, a large miss number when accessing data means that only a small number of nodes have the requested data, and they do not have time to distribute them to everyone. The amount of received/sent p2p traffic allows to identify the node tackling network configuration or channel issues.
Standard system metrics of blockchain nodes are described in a large number of sources, so we will be brief. They help to find logic bottlenecks and errors.
CPU
CPU shows how many calculations the processor performs. If CPU load is high - the node is calculating something, actively using logic or FPU (almost never used in blockchains). The latter happens, for example, because the node is checking electronic signatures, processing transactions with strong cryptography, or making complex calculations
CPU can be divided into more metrics that point to the code bottlenecks. For example, system time - time spent on kernel code, user time - time spent on user processes, io - waiting for i/o from slow external devices (disk/network), etc. Here is a good article on the topic.
Memory
Modern blockchains use key-value databases (LevelDB, RocksDB), which constantly store “hot” data in their memory. Any loaded services suffer from memory leaks caused by errors or targeted attacks on the node code. If memory consumption is increasing or has sharply increased, most likely it is due to a high number of state database keys, large transaction queues, or increased amount of messages between different node subsystems.
Memory under-load indicates a possible increase in block data limits or maximum transaction complexity.
Full nodes, that respond to network clients, rely on file cache metrics. When clients access various parts of the state database and transaction log, old blocks from the disk may appear and replace the new ones. That, in turn, slows down the client response speed.
Network
The main network metrics are the amount of traffic in bytes, the number of sent and received network packets, packet loss ratio. These metrics are often underestimated, because blockchains are not yet able to process transactions at a speed of 1 Gbit/sec.
At the moment, some blockchain projects allow users to share their WiFi or provide services for storing and sending files or messages. When testing such networks, quantity and quality of the network interface traffic become extremely important, as a crowded network channel affects all other services on the machine.
Storage
Disk subsystem is the slowest component of any service and often causes serious performance issues. Excessive logging, an unexpected backup, an inconvenient read/write pattern, a large total blockchain volume - all this can lead to a significant node slowdown or to excessive hardware requirements.
The pattern of blockchain transaction log operation with the disk resembles different DBMS using Write Ahead Log (WAL). TThe transaction log can technically be regarded as WAL for the state database.
Therefore, those storage metrics are important because they identify bottlenecks in modern key-value databases. Read/write IOPS number, max/min/avg latency and many other metrics that help optimize disk operations.
To sum up, we can group the metrics into:
Each group is important since there can be subsystem errors limiting operation of other components. Slowdown of even a small number of validators can seriously impact the entire network.
The most tricky errors in consensus and finality algorithms arise only due to a large transaction flow or consensus parameter changes. Their analysis requires reproducible testing conditions and complex load scenarios.