This article covers a small side project from Noncesense Research Lab, a non-profit decentralized collective of developers, data scientists, and assorted blockchain nerds that work together to study interesting phenomena in consensus ecosystems. You are welcome to join our public discussion on IRC — feel free to introduce yourself, chat pseudorandomly, or just lurk and learn.
This week, I was looking for a lighthearted research project for the holidays, and thought it would be amusing to toy around with visualizing reported nonce values. Neptune Research (co-founder and developer for the Monero Archival Project) kindly provided a data dump of nonces from the Monero blockchain.
I expected that exploratory analysis on this data would be an entirely nonsensical endeavor, since possible winning values are scattered randomly through a huge input range that miners sample by brute force. In fact, I joked in the Monero Research Lab: My Jupyter notebook at the end will demonstrate that I spent an hour on a completely useless task, thus itself being a “proof of work”…
However, I encountered several very distinct patterns in the winning nonces. A few histograms and visualizations revealed signatures of multiple unique nonce search strategies being used by miners today! You can follow along with the analysis in this Jupyter Notebook.
Consider this Monero block that showed up as I am writing this article. Here is a simplified version of some of the information that it contains:
This (real) example block contains pointers to 7 transactions that are being mined from the memory pool. The miner marks the height of the block and a pointer toward the previous block.
Lastly, the miner must select a random value for the nonce, bundle it with the rest of the information, and hash the whole block (including the nonce). If the output hash has enough 0’s at the end, the miner has successfully completed the block. In this case, the miner found that the nonce 944 caused the output hash:
The nonce field is allowed to contain any 32-bit integer. In other words, the miner can include any whole number between 0 and 4,294,967,295 that causes a hash value that meets the network’s difficulty threshold.
After a miner prepares the block (excluding the nonce) they must search the nonce space by trial and error to find a valid nonce. Note that there are generally many values scattered randomly between 0 and 4.3 billion that make an acceptable nonce, and a miner only has to find any one of them!
If miners are randomly sampling values between 0 and 4.3 billion, and the winning nonces are randomly distributed along that same range, then there should be no pattern in the values recorded in the blockchain.
In this case, we would expect the distribution to be “uniform” — in other words, we expect approximately the same number of blocks to be completed with nonces between 1 and 2 billion, as there are blocks that were completed with nonces between 2 and 3 billion.
When we actually plot a histogram of the winning nonces, a significant bias toward small values becomes apparent, from the enormous spike at the left-most bar:
Let’s zoom in on that giant bar at the low end of the nonce range, between 0 and 10,000:
We see that more and more blocks have been found at small nonce values, which suggests that many miners start brute forcing nonces from zero, and scan linearly through the search space.
The scale of this practice is clearer from a look at the cumulative distribution function of nonce values:
With the linear x-axis, the CDF appears to start at y = 0.6 because the majority of nonces are reported between 0 and 10,000 — and that region is only 1 pixel wide on the above plot.
This distribution in the commonly-reported nonce range is much easier to visualize when we re-plot the figure with a log x-axis:
Remarkably, we see that half of the nonces are found between 0 and 10⁵, which covers only a small fraction of the possible values!
For Monero, 50% of winning nonces come from 0.002% of the search space
The fraction of blocks found in this small range can be used as a proxy for the number of miners following the search strategy. In this case, it appears that approximately half of the hash rate is using software that begins a linear nonce search from zero.
Other observable nonce search algorithms
Some less common approaches were also observed. In the previous section, we investigated the excess of nonces at very low values. However, notice that there are also some bumps near the middle and large end of the nonce space.
Before we investigate those as well, let’s look at a random slice of the search space and see if there is any structure. We’ll just zoom in on the distribution of nonces between 60000000 and 61000000. As we would expect, there is no pattern, and the values are roughly uniformly distributed.
Now, with the above uniform control case in mind, let’s look at the top end of the search space. Did anybody write software that starts at the maximum nonce value (2³²) and scans backwards?(note the x-axis offset 4.3*10⁹ ~ 2³²)
Indeed, we see a scan pattern with a local maximum toward the highest bin and a dropoff toward lower values! This occurs on a much smaller scale than scanning upward from zero, but the pattern is still very clear.
For the last example, let’s zoom in around the halfway point. (note the x-axis offset 2.1*10⁹ ~ 2³²/2)
We see another similar scan pattern on a very small nonce window! This implies that there is some piece of mining software that starts halfway through the search space and mines upwards, brute testing each nonce candidate.
We can use this type information to put weak statistical bounds on hashrate originating from any particular piece of software that uses a particularly distinct nonce search pattern.
For example, suppose that mining software X searches the nonce space by starting at the halfway point (~2.1 billion) and iterating upwards. Any blocks with a nonce less than 2.1 billion most likely did not originate from Software X. Glancing back at the CDF, we note that this partitions the blockchain into ~70% of blocks that could NOT have originated from software X, and ~30% of blocks that could have originated from software X.
This particular clear feature is contained within another very small window of the search space (0.002%) starting at the halfway mark. By integrating under the histogram on this small slice, we see that ~50000 blocks (out of 1.6 million) appear to have been generated by software that scans upwards from the halfway point. This heuristically partitions the pool into 3% of blocks that may have come from software X, and 97% of blocks that originated from software following a different search strategy.
This type of heuristic partitioning is not very informative (i.e. is not privacy damaging) for software that uses the two “normal” search patterns, i.e.
- Scan through nonces linearly, starting from 0
- Sample nonces randomly.
However, particularly unusual search patterns (such as scanning from the center) cause blocks to appear in nonce-value clusters that likely originate from the same software or 2+ entities following an identical unusual strategy (see Occam’s razor).
Update 2018–01–07: Antoine Le Calvez just shared a fantastic visualization that highlights how some of these drastic nonce patterns evaporated abruptly during the 2018 network upgrade, which was the topic of my first Hackernoon article.
What does this mean for you?
These analyses and results have absolutely no impact on the privacy and security of general Monero users.
If you use Monero for transactions and do not mine yourself, then this has no impact on your privacy or security.
If you are a miner who uses software that employs a common search pattern (scan upward from zero, or sample randomly), then this has effectively no bearing on your privacy or security.
If you are a miner that is using custom/obscure software that mines with an unusual nonce space search pattern (e.g. down from the maximum, or up from the midway point), then you should switch to random sampling so that your blocks blend into your transaction trees and the overall blockchain. Note that for each new block, you only need to call
random() once for the first nonce, then you can iterate sequentially in either direction — this will entirely remove statistical links between your blocks’ nonces.
While the choice of search algorithm does not affect any individual miner or their profits, it is best for the ecosystem privacy if all parties used a random search start. Any patterns or trends that allow an anonymous pool to be heuristically partitioned indicate that its objects are weakly distinguishable, which is sub-optimal from a privacy perspective.
Do you find this kind of exploratory data analysis interesting? You can work on similar projects during the Insight Decentralized Consensus Fellows Program, a professional fellowship to help software engineers and scientists transition to a career creating leading-edge technologies in the blockchain space. This will be a free, full-time, 7 week fellowship based in San Francisco, starting March 2019. Apply now!