paint-brush
Is the Nonce in Bitcoin Really Random? Analyzing Over 860,000 Blocksby@javiermateos
405 reads
405 reads

Is the Nonce in Bitcoin Really Random? Analyzing Over 860,000 Blocks

by Javier MateosNovember 16th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Is the nonce in Bitcoin truly random? This analysis of over 860,000 blocks uncovers patterns, potential biases, and their implications for decentralization.
featured image - Is the Nonce in Bitcoin Really Random? Analyzing Over 860,000 Blocks
Javier Mateos HackerNoon profile picture

The graph shouldn’t look like this… or should it?


The distribution of the nonce value in Bitcoin could indicate pure randomness or, on the contrary, reveal strategies used by miners. A frequency analysis of over 860,000 blocks should reflect a trend towards randomness, but is that really the case?


Before we begin... What is the nonce?

The nonce (number used once) in Bitcoin is a 32-bit number that miners use to generate a valid hash for a block. This random number is found in the block header and, along with other data, is used to create a hash that must be smaller than the target defined by the network.


The simple formula for calculating the block hash is as follows:


Block hash = SHA-256(SHA-256(block header))


In other words:

Block hash = SHA-256(SHA-256(version + previous block hash + Merkle Root + timestamp + difficulty or bits + nonce))


The nonce is the value that miners adjust in each attempt to find a hash that meets the network’s target. The possible range for the nonce is from 0 to 4,294,967,295. This process is repeated until the generated hash is valid, allowing the block to be added to the blockchain.

After concatenating the block header information, the SHA-256 hash function is applied. Then, SHA-256 is applied again to the result of the previous step.The formula for calculating the block hash is computationally simple.


The formula for calculating the block hash is computationally simple: the data is concatenated and the SHA-256 algorithm is applied twice. However, for the resulting hash to be valid, it must meet a specific requirement: it must start with a certain number of zeros, as defined by the network's difficulty. If the components of the function remain constant, the resulting hash will always be the same. This is where the nonce comes into play—a one-time number that is continuously adjusted to modify the hash outcome.


This process creates a "cascading effect": even the slightest change in the data (such as altering the nonce) completely alters the hash. Miners test different nonce values until they find a hash that meets the difficulty level required by the network. This ensures that the proof of work is challenging and that the final block meets the security requirements established.


An important point is that, in addition to the nonce, the timestamp is another dynamic value that also affects the hash. The timestamp represents the exact moment when the block is being mined and changes automatically over time. If the proof takes too long, the timestamp updates, "forcing" the miner to restart the process from scratch, testing again with different nonce values to find a valid hash. The rest of the block's data, such as the list of transactions or the Merkle root, may also change, though it is less likely to change over short periods (e.g., every second), unless new transactions are added or network changes occur.

Do miners use the entire range of possible nonce values?

If you roll a die, the probability of landing on a specific number, like 4, is 1/6. Even if you roll the die many times and it always lands on 4, the probability of landing on 4 on the next roll remains 1/6... unless the die is rigged 😉.


Similarly, in Bitcoin mining, miners use brute force to find a nonce that causes the block hash to meet the difficulty target (e.g., the hash must start with 18 zeros). To do this, they test nonce values sequentially: first 0, then 1, then 2, and so on. The valid hash could be found with nonce number 3,245,231, or even with nonce number 3.


In large mining operations, such as farms or pools, the range of possible nonce values is divided among multiple miners, assigning specific segments to avoid overlap. For example, one miner might handle the range of nonces from 0 to 1,000,000, while another works on the range from 1,000,001 to 2,000,000. This helps optimize resources and increases the efficiency of finding the valid hash.


Whether it’s individual or collective mining, the search for this number remains random and unpredictable. But... has it always been like this? Will it always be? Or can the die be "rigged"? 😉

Let's look at the data

As we mentioned earlier with the dice example, the probability of getting any result when rolling them is 16.67% (1/6), regardless of previous rolls. As we accumulate more rolls, these frequencies should tend toward this theoretical value.


Does the same happen here with the nonce values? To analyze this, we divided the range of nonce values into 16 equal parts. Following the same reasoning as with the dice, the theoretical probability of any nonce falling within one of these ranges is 6.25% (i.e., 1/16).


However, when observing the data, we notice some deviations from this expected probability. To make the analysis easier, the cases where the percentage exceeds 6.25% are marked in green, and where it is equal to or less than this value, they are marked in red, from the first block to block 867,366:


Years 2009-2012


Years 2013-2016


Years 2017-2020


Years 2021–2024* (up to the bloque 867.366 … 25/10/2024)


In the provided tables, specific patterns can be observed at a glance through the use of color. The 6.25% probability should remain relatively constant if there are no biases in the generation of nonce values. However, the data shows significant variations, especially in the early ranges, such as Range 01, where the actual probability consistently exceeded the theoretical value, peaking at 49.62% in 2010. This indicates a clear preference for low nonce values in the early years.


This suggests that, in the beginning, the generation of nonce values was not as random as expected, possibly due to less sophisticated mining methods. As mining technology advanced (both in quantity and quality), this overconcentration in low ranges decreased, aligning more with the theoretical probability of 6.25% in the following years.


Next, these results and their potential implications will be examined.

Temporal Evolution of the Ranges

Years 2009–2024* (up to the bloque 867.366 … 25/10/2024)


Analyzing the annual evolution, the following points can be highlighted:


  • 2009–2010: The distribution of nonces shows a clear concentration in the early ranges (Range 01 and Range 02), with percentages significantly exceeding the theoretical probability. This indicates lower complexity in the initial mining algorithms and potential reuse of low nonces.
  • 2011–2015: During this phase, a gradual correction is observed. Percentages in the early ranges decrease, approaching the expected 6.25% probability. This coincides with the adoption of more efficient mining hardware and improvements in nonce generation algorithms.
  • 2016–2020: The distribution becomes more uniform, with smaller variations between ranges. However, Range 04 and Range 12 show occasional peaks, suggesting some preference or recurring pattern in the mining algorithms used during these years.
  • 2021–2023: From 2021, more pronounced fluctuations are observed, especially in the first 5 ranges. This could reflect changes in the mining ecosystem, such as the consolidation of large mining pools and the possible implementation of strategies to optimize the use of low nonces.
  • 2024: The data for 2024 show stabilization in the percentages, although there is still a slight trend towards higher utilization of the earlier ranges. This could be due to the persistence of certain optimized mining algorithms or adjustments in miners' strategies.

Reducing the analysis to just two ranges

In this analysis, we are considering only two possible ranges for the nonces:

  • Range A: From 0 to 2,147,483,648 (the lower half of the total range, equivalent to 50% of the total space).
  • Range B: From 2,147,483,649 to 4,294,967,295 (the upper half of the total range, also equivalent to 50% of the total space).


The theoretical probability for each range is 50%, as both cover half of the possible nonce value space.


Percentage of nonce values in just two ranges, splitting the value range in half.


Years 2009 to 2016


Years 2017 to 2024* (up to block 867,366 … 25/10/2024)


It is evident that, in addition to the earlier periods (as analyzed previously), in the last 7 years there has been a persistent trend towards the appearance of the "winning" nonce in the first range, i.e., within the first 2,147,483,648 values. This preference is particularly notable, as the theoretical probability for both ranges should be 50%. However, the data shows that during this period, the lower range has had a higher incidence, consistently surpassing the theoretical mark.


Looking at the bigger picture, the following question arises: Are there specific strategies that contribute to this over-concentration in the lower range, despite the process theoretically being completely random and fair for both ranges?

Strategic Interpretations

The observed distribution of nonce values provides important strategic insights for miners. The fact that lower ranges have historically shown a higher frequency of use suggests several key points:


  1. Optimization of Low Nonce Usage: Miners seem to have adjusted their practices to efficiently exploit nonces within the lower ranges. This strategy might be related to reduced latency in finding valid blocks, as working with nonces in a known or familiar range could shorten the search time.
  2. Risk of Centralization: The preference for certain nonce ranges can lead to potential centralization. Large mining pools might be configuring their algorithms to intentionally explore nonces in the lower ranges, thereby reducing the randomness expected on the network. This behavior could increase the risk of power concentration in a few actors, affecting the decentralization inherent in the network.
  3. Strategies to Mitigate Bias: To avoid these predictable patterns and promote a more equitable distribution of nonces used, it would be wise to consider adjustments to mining algorithms. These changes could include methods to randomly distribute attempts across the entire possible range, thus mitigating the exploitation of specific ranges.
  4. Impact of Modern ASIC Hardware and Firmware: While newer mining equipment has increased efficiency, it could also introduce biases toward lower nonces due to internal optimizations. Hardware developers may have identified certain patterns that reduce computational time, which would explain this preference.
  5. Mining Software-Specific Algorithms: Mining pools and the software they use may be implementing strategies that prioritize low-range nonces. This could stem from empirical observations of previous blocks that showed greater success in these ranges or simply from design decisions aimed at improving performance.


The recent data from 2023 and projections for 2024 show a slight stabilization, although significant fluctuations in the lower ranges are still observed. This could indicate:


  1. Changes in Mining Hardware and Software: Advances in ASICs and mining software may be automatically adjusting the nonce ranges to maximize efficiency. These changes could result in more targeted exploration of specific ranges to optimize mining performance and reduce computational resources.
  2. Impact of External Events: Events such as Bitcoin halvings may influence the distribution of nonces, as miners adjust their strategies to adapt to changes in rewards. The reduction in block rewards after a halving could lead miners to focus on strategies that enhance the likelihood of finding valid nonces more quickly, possibly favoring certain ranges over others.



Conclusion

The analysis of nonce values in Bitcoin reveals that, although the mining process should be random, there are significant patterns, particularly in the early years and in the lower ranges of values. This bias towards lower nonces can be attributed to the lack of sophistication in early mining algorithms, but it also suggests the possibility that miners are using strategies to optimize the search for valid blocks, which may be reducing the expected randomness.


The recent trend shows slight stabilization, although fluctuations persist, especially in the lower ranges. This could result from advances in mining hardware and software, allowing further optimization of the process, but it may also contribute to the centralization of mining power. If these patterns continue, there is a risk that mining could concentrate further in a few actors, potentially compromising the decentralization of the network.


Ultimately, the observed behavior could be interpreted as a natural adaptation by miners to a competitive environment, but it raises the question of whether it is possible to "mark the die" to gain an advantage.


Source: Data obtained from my own Bitcoin validator node.