In a [previous article](https://hackernoon.com/fraud-proofs-secure-on-chain-scalability-f96779574df), I wrote a summary of a paper about Fraud Proofs written by Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin. They propose a secure on-chain scaling solution that gives strong security guarantees.\n\nThe goal of this article is verifying their results regarding the data-availability section by simulating the system. The paper uses some well-known combinatorics formulas to prove the security of the network in various setups. Since the model is probabilistic, we can make a program to simulate it. After running the simulation enough times and averaging the results, we hopefully see that we get the same results.\n\nI decided to do this not because the mathematics of the paper didn’t convince me, but because I think it would be fun to see if simulating the system would yield to the same results. Moreover, simulating the system may be another way to compute results faster than calculating a particular resource-intensive formula. In particular for populating Table 1 in the paper.\n\nBefore talking about the simulation, lets first have minimal intro about the relevant part of the paper for the simulation.\n\nIn a nutshell, a fraud-proof is a tool that let light-nodes receive a proof that some block is invalid. An honest full-node pack the minimum amount of information necessary to challenge the light-node to check the block and be convinced that something is wrong. Doing this in a scalable way involves making tradeoffs between proofs succinctness and real information density in the block.\n\nHowever, to make fraud proofs, block data is a fundamental requirement. If we don’t have the data to prove that a state transition is wrong, how could we have all the necessary information to convince another node that this happened? (For the moment forget about [_mathemagics_](https://eprint.iacr.org/2018/046.pdf)).\n\nThe paper solves the data-availability problem with two fundamental ideas:\n\n* Two-dimensional Reed-Solomon as erasure codes\n* Light-nodes doing random sampling of block shares\n\nThe paper combines these two ideas towards closed-formulas to calculate data-availability probabilities for some set of parameters of the network:\n\n* _k_, number of _shares_ within a block.\n* _s_, the number of shares each light-nodes try to pull from a full-node to be satisfied.\n* _p_, the probability of block reconstruction made possible by light-nodes share sampling.\n* _c_, the number of light-nodes necessary to satisfy _p_ in a network that is configured for _k_ and _s_.\n\nRemember these parameters since they are used in the rest of this article.\n\n#### Random-sampling scheme\n\nTo understand what the simulation is doing is essential to understand what _light-nodes random sampling_ means. You can find a detailed explanation of this in section 5.3 of the paper. However, here’s the main idea so you can imagine what we’re trying to do.\n\nGiven a new block, the most important interest for the honest nodes in the network is having guarantees that the block data is available. It doesn’t necessarily mean forcing each honest node to have all the shares, but just knowing that as a _team_ they could reconstruct it.\n\nFirst, the light-nodes pull a random a set of _s_ shares from the new block. The light-nodes have no coordination in what shares to ask, so the possibility of many of them pulling the same shares is reasonable. As an analogy, you can think of getting pieces of a puzzle. The **light**\\-nodes aren’t trying to pull the whole puzzle, just some random pieces. The **full**\\-nodes are interested in the full puzzle.\n\nFinally, the light-nodes interact with other full-nodes they’re connected to via _gossiping_. First, the light-nodes notify the full-nodes it’s connected to about the pulled shares. Keeping this running between light-nodes and full nodes, and between full-nodes and other full-nodes allows the set of honest full-node to reconstruct the whole puzzle (block data).\n\nDoing the random-sampling has many advantages:\n\n* Light-nodes require small bandwidth usage since they pull a small set of shares.\n* Light-nodes require small storage space.\n* Full-nodes leverage light-nodes towards the mutual goal of having all the block data.\n* Block reconstruction is decentralized.\n* It leverages the number of light-clients available. The more light-clients, the better it performs.\n\nUsing erasure codes in block data makes each extended share less important _individually_ for block reconstruction, which is a essential contribution to random-sampling and gossiping goal.\n\nIn the full picture the light-nodes an extra check. Each share comes with the Merke Proof that proves that the current share is from the block data. Since this the paper discusses a 2D coding, then this proof could be from the row or column dimension of the encoding. If we use a N dimension erasure codes, then it could be a proof of the N possible Merkle Trees from which the share lives.\n\n### Simulation as a verification\n\nGenerally, a [_Montecarlo method_](https://en.wikipedia.org/wiki/Monte_Carlo_method) is a way of doing calculations in complex systems where formal proofs are hard, or where complete computation is intractable. Its applications are so broad that I guess the term is often abused.\n\nThe very basic idea is using random-sampling to calculate an _empirical mean_ of the desired output of the model with the goal of, hopefully, getting close to the real mean.\n\nWe can apply the same idea to find results by simulating a probabilistic system. We run many simulations and calculate the _empirical mean_ that should match the desired real mean. As a tool of verification, we can compare the result of the simulation with the results obtained in another way, e.g., a formal proof.\n\nHowever, the simulation idea is, in fact, more powerful. We could start playing with the system, tuning parameters or introducing new ideas, and quickly see how it reacts. How easy is to _play_ with the simulation depends on how resource-intensive is running a simulation instance and the minimal amount of iterations we want for calculating a _reliable_ empirical mean.\n\n### Fraud-proof network simulation\n\nThe authors of the paper analyze multiple properties of the solution using closed mathematical formulas. Most of the heavy-lifting math goes around calculating probabilities as a result of light-nodes doing a random-sampling of shares of the blocks.\n\nAs a way to check those closed formulas, we could make a program that simulates the light-nodes and full-nodes behaviors, let them interact and see what happens. If we run this simulation multiple times, we could make a statistical analysis of the interesting metrics and see if they match with the closed formulas from the paper.\n\nA _full-node_ starting in every simulation instance makes available a new block composed of 4_k²_ shares. A number _c_ of _light-nodes_ try to pull _s_ distinct random shares from the _full-node_. The full-node accept to give shares to light-nodes up to a point where he already shared 4_k²-(k+1)²_ distinct shares; this is the worst case scenario for light-nodes.\n\n**When a simulation iteration reaches a point where the full-node decides to reject the response share request, then the iteration is considered _successful (meaning the full-node was exposed)._ By running many iterations, we can estimate _p_ by calculating the ratio of successful iterations of a setup.**\n\nThe simulator is program written in Go and [publicly available](https://github.com/jsign/fraudproofsim), anyone is invited to see the details or improving it. It has a CLI which has three commands:\n\n* _verifypaper_: verifies the results of _Table 1_ of the paper.\n* _solve_: solves _c_ for a particular setup of _k_, _s_ and _p_.\n* _compare_: compares the Standard vs Enhanced model proposed in the paper.\n\nMy initial motivation was only doing the first command, but after some thought and emails with Mustafa and Alberto, I decided to go a little further and implement the last two.\n\nRunning the program without commands gives some helpful info about how to use the CLI interface:\n\n$ git clone https://github.com/jsign/fraudproofsim.git \n$ cd fraudproofsim && go get ./... \n$ go run main.go \nIt permits to compare, solve and verify fraud-proof networks.\n\nUsage: \n fraudproofsim \\[command\\]\n\nAvailable Commands: \n compare Compares the Standard and Enhanced models \n help Help about any command \n solve Solves c for k, s and p \n verifypaper Verifies setups calculated in the paper\n\nFlags: \n --enhanced run an Enhanced Model \n -h, --help help for fraudproofsim \n --n int number of iterations to run per instance (default 500)\n\nUse "fraudproofsim \\[command\\] --help" for more information about a command.\n\nYou can see the three mentioned commands, but also two general flags:\n\n* _enhanced_, which allows you to choose running the network on an Enhanced model. The default is the Standard model.\n* _n_, is the number of iterations of the simulation within a setup to calculate the desired result. The default value is 500.\n\nI’m going to show each command and discuss the results. All the runs are made in my laptop, [i7–4710HQ](https://ark.intel.com/products/78930/Intel-Core-i7-4710HQ-Processor-6M-Cache-up-to-3-50-GHz-) and 8GB of RAM. Not really powerful hardware for running simulations.\n\n### verifypaper command\n\nThe idea of this command is to verify the results of Table 1 in the paper:\n\nAs the footnote mentions, evaluating Theorem 4 is extremely resource-intensive:\n\n!(https://hackernoon.com/hn-images/0*-XFdwIL2zTcC47t-)\n\nThe _verifypaper_ command has baked in the setups that correspond to each case in the table.\n\n$ go run main.go help solve \nIt solves c for k, s and p (p, within a threshold)\n\nUsage: \n fraudproofsim solve \\[k\\] \\[s\\] \\[p\\] \\[threshold?\\] \\[flags\\]\n\nFlags: \n -h, --help help for solve\n\nGlobal Flags: \n --enhanced run an Enhanced Model \n --n int number of iterations to run per instance (default 500) \n$ go run main.go verifypaper \nk=16, s=50, c=28 => p=1 37ms \nk=16, s=20, c=69 => p=0.994 28ms \nk=16, s=10, c=138 => p=0.988 37ms \nk=16, s=5, c=275 => p=0.986 37ms \nk=16, s=2, c=690 => p=0.99 63ms \nk=32, s=50, c=112 => p=0.996 137ms \nk=32, s=20, c=280 => p=0.994 131ms \nk=32, s=10, c=561 => p=0.988 136ms \nk=32, s=5, c=1122 => p=0.992 143ms \nk=32, s=2, c=2805 => p=0.994 175ms \nk=64, s=50, c=451 => p=0.996 464ms \nk=64, s=20, c=1129 => p=0.996 536ms \nk=64, s=10, c=2258 => p=0.992 510ms \nk=64, s=5, c=4516 => p=0.988 527ms \nk=64, s=2, c=11289 => p=0.996 679ms \nk=128, s=50, c=1811 => p=0.992 2193ms \nk=128, s=20, c=4500 => p=0.702 2068ms \nexit status 2\n\nSome notes to understand these results:\n\n* Since the _n_ flag wasn’t present, the default value was used. This means each setup runs 500 times to estimate _p_.\n* Since the _enhanced_ flag wasn’t present, a Standard model is used. Regarding verification of the paper configurations, the kind of model used isn’t relevant since the idea of the different models is to improve _soundness_ of the network.\n* The letters _k_, _s_, _c_ and _p_ have the same meaning as defined in the paper.\n* For the configuration described in each line, the simulation runs and estimate _p._ Also, it shows how much time took to run.\n* The last line of the output is the total time of the verification.\n\nIn all the cases with _k_ less than 128, we see that the estimated _p_ is always close to .99. This means that the simulation results agree with the ones of Table 1.\n\nFor _k=128_ we wee that _p_ isn’t always close to 0.99. For s=50, we can see that we have good results, but for other values, we have lower probabilities than expected. This result is reasonable since the table gave some approximate results. I left these setups intentionally optimistic to see that the value _p_ is coherent.\n\nSo these results are great since we can safely say that the results of the simulation match the results obtained by the paper using the closed formulas. Moreover, we can see that the running times for each setup is quite short, which is nice too.\n\nInstead of fixing _c_ and calculating _p_, we can use the _solve_ command as I’ll show below.\n\n### solve command\n\nGenerally, verifying results is cheaper than finding them. The _verifypaper_ above tries to verify _p_ for _k_, _s_ and _c_. But we could also populate _Table 1_ by solving _c_ for _k_, _s_ and _p_. This is what the _solve_ command does.\n\nIn particular, it finds _c_ doing a binary-search in some reasonable domain space. For each candidate _c_, _p_ is estimated. Depending if the **estimated** _p_ is greater or lower than the **desired** _p_, the value of _c_ is _binary-searched_.\n\n$ go run main.go help solve \nIt solves c for k, s and p (p, within a threshold)\n\nUsage: \n fraudproofsim solve \\[k\\] \\[s\\] \\[p\\] \\[threshold?\\] \\[flags\\]\n\nFlags: \n -h, --help help for solve\n\nGlobal Flags: \n --enhanced run an Enhanced Model \n --n int number of iterations to run per instance (default 500)\n\nIf we see in _Table 1_, for _k=64_ and _s=10_ and _p=.99_ the value of c is 2258. Let’s _solve_ for this setup and see what happens:\n\n$ go run main.go solve 64 10 .99 0.005 \nSolving for (k:64, s:10, p:0.99, threshold:0.005) \n\\[1, 16384\\]: c=8192 p=1 \n\\[1, 8192\\]: c=4096 p=1 \n\\[1, 4096\\]: c=2048 p=0 \n\\[2048, 4096\\]: c=3072 p=1 \n\\[2048, 3072\\]: c=2560 p=1 \n\\[2048, 2560\\]: c=2304 p=1 \n\\[2048, 2304\\]: c=2176 p=0.002 \n\\[2176, 2304\\]: c=2240 p=0.902 \n\\[2240, 2304\\]: c=2272 p=1 \n\\[2240, 2272\\]: c=2256 p=0.994 \nSolution c=2256 with p=0.994 (4900ms)\n\nIn each line we can see:\n\n* _\\[a,b\\]_ shows where are we standing in the current step of the binary-search.\n* _c_ is the proposal being evaluated.\n* _p_ is the estimated result of the desired _p_ we’re looking for\n\nAs we can see, we found a value of _c_ quite close to the exact result. The _threshold_ parameter is used to solve for _p_ within a range.\n\nLet’s try with a smaller threshold and a lot more iterations for calculations:\n\n$ go run main.go solve 64 10 .99 0.0001 --n 2000 \nSolving for (k:64, s:10, p:0.99, threshold:0.0001) \n\\[1, 16384\\]: c=8192 p=1 \n\\[1, 8192\\]: c=4096 p=1 \n\\[1, 4096\\]: c=2048 p=0 \n\\[2048, 4096\\]: c=3072 p=1 \n\\[2048, 3072\\]: c=2560 p=1 \n\\[2048, 2560\\]: c=2304 p=1 \n\\[2048, 2304\\]: c=2176 p=0.0025 \n\\[2176, 2304\\]: c=2240 p=0.8955 \n\\[2240, 2304\\]: c=2272 p=0.9995 \n\\[2240, 2272\\]: c=2256 p=0.9885 \n\\[2256, 2272\\]: c=2264 p=0.9955 \n\\[2256, 2264\\]: c=2260 p=0.9945 \n\\[2256, 2260\\]: c=2258 p=0.992 \n\\[2256, 2258\\]: c=2257 p=0.994 \n\\[2256, 2257\\]: c=2256 p=0.9865 \n\\[2256, 2257\\]: c=2256 p=0.9915 \nSolution c=2256 with p=0.9915 (31346ms)\n\nThe found solution is the same, but we can see that the total running time is greater. The reason for this is twofold:\n\n* Since _n_ is greater, each simulation for the candidate _c_ takes more time.\n* Since _threshold_ is smaller, the binary search goes further in getting close to 0.99.\n\nNow we’ll try to calculate the estimated solution for the >40000 scenario in the _Table 1_:\n\n$ go run main.go solve 128 2 0.99 0.005 \nSolving for (k:128, s:2, p:0.99, threshold:0.005) \n\\[1, 65536\\]: c=32768 p=0 \n\\[32768, 65536\\]: c=49152 p=1 \n\\[32768, 49152\\]: c=40960 p=0 \n\\[40960, 49152\\]: c=45056 p=0.796 \n\\[45056, 49152\\]: c=47104 p=1 \n\\[45056, 47104\\]: c=46080 p=1 \n\\[45056, 46080\\]: c=45568 p=1 \n\\[45056, 45568\\]: c=45312 p=1 \n\\[45056, 45312\\]: c=45184 p=0.956 \n\\[45184, 45312\\]: c=45248 p=0.976 \n\\[45248, 45312\\]: c=45280 p=0.998 \n\\[45248, 45280\\]: c=45264 p=0.986 \nSolution c=45264 with p=0.986 (34220ms)\n\nGood, pretty in line with the Table 1 estimation.\n\nLet’s force the simulation to find a solution for a _k_ that doubles the biggest analyzed in the paper, and a number _s_ that makes the worst-case scenario for _k_:\n\n$ go run main.go solve 256 2 0.99 0.005 \nSolving for (k:256, s:2, p:0.99, threshold:0.005) \n\\[1, 262144\\]: c=131072 p=0 \n\\[131072, 262144\\]: c=196608 p=1 \n\\[131072, 196608\\]: c=163840 p=0 \n\\[163840, 196608\\]: c=180224 p=0.076 \n\\[180224, 196608\\]: c=188416 p=1 \n\\[180224, 188416\\]: c=184320 p=1 \n\\[180224, 184320\\]: c=182272 p=1 \n\\[180224, 182272\\]: c=181248 p=0.964 \n\\[181248, 182272\\]: c=181760 p=1 \n\\[181248, 181760\\]: c=181504 p=0.994 \nSolution c=181504 with p=0.994 (142453ms)\n\nWe can see that we found the solution in some reasonable time 2min and 22s. Alberto confirmed that these times are several of orders faster than computing _Theorem 4_ for _Table 1_ in the paper.\n\n### compare command\n\nThe paper put discuss the _soundness_ property of the solution. This means, understanding if any light-nodes would complete pulling their shares before being alerted of a data unavailability problem.\n\nAs a summary, the Standard models allows the full-node to recognize which light-node is asking for each share. Thus, full-node is able to select which light-nodes to reply to satisfy the maximum possible number of light-nodes.\n\nThe Enhanced model implies not allowing the full-node to recognize which light-node is asking for each share. Thus, the full-node can’t have certainty about how many full-nodes are close to being satisfied.\n\nIn the simulation, each time the full-node receive a share request it has the option to accept or reject the request. If the latter happens, then the full-node is considered malicious, meaning that it probably has intentions of making data unavailable.\n\nWhen simulating with the Standard Model, the _c_ light-nodes run serially. This model the worst-case scenario for the light-nodes because it violates _soundness_ as much as possible. The first _z_ light-nodes, each asking for _s_ shares, produce a total of _z\\*s_ share request. While this number is small enough compared with the full-node rejection criteria, then all appear to run smoothly. However, when _z_ comes to the critical point close to _c_, then the full-node probably reject start rejecting requests_._\n\nOn the other hand, when the simulation run in Enhanced Model, a random light-node is elected to ask for the next share. Since the light-nodes are selected randomly, on average they all progress evenly in their journey of asking for their corresponding _s_ shares. Thus, fewer of them will complete their journey before someone noticing the data-unavailability.\n\nTo understand this better we have the _compare_ command. This command compares the two models for a _k_ and _s_ setup for various _c_. For each simulation, it calculates how many light-nodes finished asking their _s_ shares before the full-node makes a rejection.\n\nIt automatically generates a plot as a _png_ file to understand the results:\n\n$ go run main.go help compare \nCompares Standard and Enhanced model to understand their impact on soundness\n\nUsage: \n fraudproofsim compare \\[k\\] \\[s\\] \\[#points\\] \\[flags\\]\n\nFlags: \n -h, --help help for compare\n\nGlobal Flags: \n --enhanced run an Enhanced Model \n --n int number of iterations to run per instance (default 500)\n\nThe _#points_ parameter is the number of points we want to generate to make the interpolation.\n\nLet’s compare for a setup:\n\n$ go run main.go compare 64 10 25 \nSolving c for (k: 64, s: 10) with precision .99+-.005: \n\\[1, 16384\\]: c=8192 p=1 \n\\[1, 8192\\]: c=4096 p=1 \n\\[1, 4096\\]: c=2048 p=0 \n\\[2048, 4096\\]: c=3072 p=1 \n\\[2048, 3072\\]: c=2560 p=1 \n\\[2048, 2560\\]: c=2304 p=1 \n\\[2048, 2304\\]: c=2176 p=0 \n\\[2176, 2304\\]: c=2240 p=0.896 \n\\[2240, 2304\\]: c=2272 p=0.998 \n\\[2240, 2272\\]: c=2256 p=0.99 \nFound solution c=2256, now generating 25 points in \\[.50\\*c,1.5\\*c\\]=\\[1128, 3384\\]: \n0% \n3% \n7% \n11% \n15% \n19% \n23% \n27% \n31% \n35% \n39% \n43% \n47% \n51% \n55% \n59% \n63% \n67% \n71% \n75% \n79% \n83% \n87% \n91% \n95% \n99% \nPlotted in plot.png\n\nThe first thing it does is to solve for _c_ for a _p=.99_. This is done in order to plot for values of _c_ within .50 and 1.5 of _c_. Let’s see the generated _plot.png_:\n\n!(https://hackernoon.com/hn-images/1*ezgV1GyG9Lnss7zFhQ5JLw.png)\n\nThe result of the compare command for k=64 and s=10\n\nInteresting!\n\nFor values of _c_ lower than the one for .99 guarantee, we see that both the Standard and Enhanced model have the same result. All the light-nodes finish asking their _s_ shares successfully even when the full block isn’t guaranteed to be available. This is reasonable; thinking an extreme case of _c=1_ then it’s evident that asking only for _s_ shares will be successful.\n\nWhen we reach and exceed the critical point of _c_ (=.99) light-nodes something interesting happens.\n\nIn the Standard model, if we keep adding light-nodes, the total number of _fooled_ light-nodes is bounded. This sound reasonable since _c\\*s_ shares where already asked and the full-node is pretty doomed to reject any more share request if interested in making the block unavailable. This means that no more light-nodes can be tricked.\n\nIn the Enhanced-model we see something different. The more light-nodes we add from _c_, the less total light-nodes finished asking for their _s_ shares. Since each share request the full-node receives comes from a random light-node, when the full-node reaches the critical point of share rejection not many light-nodes have yet finished asking their shares. The light-nodes _share the risk_ evenly.\n\nIf the network has more than the minimum amount of _c_ light-nodes, then rapidly fewer and fewer light-nodes complete pulling their _s_ shares before someone finds out that the full-node is malicious and alert the rest of the network.\n\nFor other setups, we see the same shape. Intuitively we expect that _s_ will have some influence on how fast the Enhanced model improves soundness. Let’s check this intuition.\n\nWith the same _k=64_ and a bigger _s_ (also more calculated points which don’t affect anything but the plot interpolation):\n\n$ go run main.go compare 64 50 50 \nSolving c for (k: 64, s: 50) with precision .99+-.005: \n\\[1, 16384\\]: c=8192 p=1 \n\\[1, 8192\\]: c=4096 p=1 \n\\[1, 4096\\]: c=2048 p=1 \n\\[1, 2048\\]: c=1024 p=1 \n\\[1, 1024\\]: c=512 p=1 \n\\[1, 512\\]: c=256 p=0 \n\\[256, 512\\]: c=384 p=0 \n\\[384, 512\\]: c=448 p=0.93 \n\\[448, 512\\]: c=480 p=1 \n\\[448, 480\\]: c=464 p=1 \n\\[448, 464\\]: c=456 p=1 \n\\[448, 456\\]: c=452 p=0.998 \n\\[448, 452\\]: c=450 p=0.978 \n\\[450, 452\\]: c=451 p=0.992 \nFound solution c=451, now generating 50 points in \\[.50\\*c,1.5\\*c\\]=\\[225, 676\\]: \n0% \n1% \n3% \n... \n97% \n99% \nPlotted in plot.png\n\nAnd the plot:\n\n!(https://hackernoon.com/hn-images/1*whJRu-DUwXUjmD3YT1zJOQ.png)\n\nResult of the compare command for k=64 and s=50\n\nYes!, a bigger _s_ in the Enhanced Model is much more aggressive in the soundness guarantee in respect of _c_.\n\nFinally, let’s see for a smaller _s:_\n\n$ go run main.go compare 64 2 50 \nSolving c for (k: 64, s: 2) with precision .99+-.005: \n\\[1, 16384\\]: c=8192 p=0 \n\\[8192, 16384\\]: c=12288 p=1 \n\\[8192, 12288\\]: c=10240 p=0 \n\\[10240, 12288\\]: c=11264 p=0.97 \n\\[11264, 12288\\]: c=11776 p=1 \n\\[11264, 11776\\]: c=11520 p=1 \n\\[11264, 11520\\]: c=11392 p=1 \n\\[11264, 11392\\]: c=11328 p=0.998 \n\\[11264, 11328\\]: c=11296 p=0.994 \nFound solution c=11296, now generating 50 points in \\[.50\\*c,1.5\\*c\\]=\\[5648, 16944\\]: \n0% \n1% \n3% \n5% \n... \n97% \n99% \nPlotted in plot.png\n\n!(https://hackernoon.com/hn-images/1*jD162PJQDTd7uHNWAY0xdQ.png)\n\nResult for the compare command for k=64 and s=2\n\nWe can appreciate that with a smaller _s_, _soundness_ is guaranteed much slower.\n\n### Running times and bottlenecks\n\nThe running times of the simulation depend on three factors: _s_, _k_, _c_ and the _number of iterations_.\n\nI profiled the code using _pprof_ multiple times to find where is the bottleneck in the simulation. It turns out that the bottlenecks are when the light-nodes decides which _s_ shares to ask from the _4k²_ shares. I’ve made multiple implementations of this particular part to improve the running times.\n\nAs a summary, I noticed a locking-contention issue within the default _rand_ library. After some research, [other people](https://blog.sgmansfield.com/2016/01/the-hidden-dangers-of-default-rand/) already noticed that within _rand_ standard-library there’s a mutex; something quite reasonable when using a singleton random seed for a concurrent library.\n\nThen I decided that each light-node would make their random seed to avoid locking-contention between different goroutines. After another _pprof_, I found out that this was quite computer-expensive; better than the original lock-contention, but expensive. Concurrency in the simulation is at the iteration level, so generating a random-seed for each operation instead of light-nodes made a significant improvement.\n\nProfiling the code and finding these things was fun too. Maybe I explain all this in more detail in a further article.\n\n### Possible Improvements\n\nThere’re a bunch of things that could be improved/enhancements in the simulation.\n\n#### Domain-size and binary-search for solve command\n\nIn the _solve_ command, I do a binary search for _c_ until the explored domain is exhausted or the desired _p_ lies within a chosen _threshold_. There may be other ways of implementing the _solve_ command or reducing the domain of search.\n\n#### Full-node decision-making for share request rejection\n\nThe full-node make the first share request rejection when 4_k²-(k+1)²_ were shared. That number corresponds to the _maximum_ number of shares the full-node could give and still make the block unavailable.\n\nThis is the best-case scenario for full-nodes where the light-nodes ask for a particular subset of shares. Since the data is encoded in 2D-Reed-Solomon, each unavailable share could be reconstructed from a row or column point of view. This full-node _best-case_ happens when the unshared shares always are in the rows and columns still unrecoverable from the block. In other words, unshared shares contribute _as much as possible_ to block unavailability.\n\nThis implies that the simulation is pessimistic, so lies on the _safe-side_ of the claims it makes. Saying it differently, most of the times the block will be recoverable even when the simulation continues considering it isn’t.\n\nThis aspect of the simulation could be improved if each time the full-node makes a new share available, it calculates what remaining shares are unrecoverable since they can’t still be reconstructed with the already shared shares using the 2D Reed-Solomon encoding. After the last unrecoverable share is shared, the full-node can be considered _doomed_. Notice that unrecoverable and unshared are different things. An unshared share could be reconstructed if enough shares of its column or row are available.\n\n#### Running time and memory usage\n\nThe simulation already has some moderate optimizations result of multiple _pprof_ profilings. Since the simulation is cpu-bound and it only needs enough random seeds depending on the number of cores in the CPU to avoid locking-contention issues, the _Simulation.Init()_ method could be improved.\n\nAlso, there could be a better solution to generate the random subset of _s_ distinct elements of a set of 4_k²_ elements (shares to pull). The actual solution is nice since it exploits the fact that _s_ is much smaller than _(2k)²._\n\nI’ve made no memory profiling, so I’m convinced there are many things to be improved in this direction.\n\n#### Simulation configuration scan\n\nThe simulation configurations in the simulation are fixed and correspond to the ones proposed in the paper. This could be easily changed, and be useful to search for a configuration that matches some required criteria.\n\nMustafa suggested that may be interesting to include network bandwidth usage or latency. In the same vein, we could scan for network configurations that optimize or establish bounds of these metrics within some minimum security requirements. The possibilities are broad, and we could play a lot with it.\n\n#### Explore other possible codings and their impact on results\n\nAnother idea mentioned by Mustafa was considering the impact on the network when using other codings for the block data. For example, adding more dimensions with Reed-Solomon, or using even other codings.\n\n### Conclusion\n\nThe simulation verified the mathematical calculations in the paper and provided a way several orders of magnitude faster to estimate values of the model than computing the closed-mathematical formula.\n\nFinally, it helps to get better intuition on how the Standard and Enhanced model impacts the _soundness_ property of the system.\n\nFurther work could be done to improve the simulation in various directions.\n\nFinally, I’d like to thank both Mustafa and Alberto for their opinions and suggestions. Special thanks to Mustafa for various ideas for the article, and kindly reviewing a draft.