The Ethereum-blockchain size has exceeded 1TB, and yes, it’s an issue

(TL;DR: It has nothing to do with storage space limits) Introduction This is an _in_direct response to the following article by , a developer for the Parity Ethereum client, written less than a year ago: Afri Schoedon _Once a month users post a chart on `r/ethereum` predicting the blockchain size of Ethereum will soon exceed 1 TB. I…_dev.to The Ethereum-blockchain size will not exceed 1TB anytime soon. I want to make it clear that I have respect for almost all of the developers in this space, and this is not intended to attack anyone. It’s meant to elaborate on what the real concerns are and explain how the original article does nothing to address those real concerns. I would actually love to see something that does, because then we can . That being said, there are some developers who mislead, obscure, ignore, and attack via protocol confusion like what occurred with , but most aren’t like that. You can’t watch something or read something and hate these developers. They’re genuinely to fight the same fight as us, and I believe Afri is part of the latter group, not the former. throw it into Bitcoin 2X and the replay protection drama like this like this trying https://github.com/paritytech/parity/issues/6372 If you’ve read my other articles you’re going to see some small bits of that information repeated. Up until now I wrote primarily about Bitcoin from a “maximalist” perspective and focused on conflicts within that community. What you may find interesting if you only watch from the corner of your eye, is that the reason for “conflict” here is exactly the same. I’ll even use Proof-of-Stake as further leverage for my argument without criticizing it. (still am) It seems like people are not reading the subtitle and misunderstanding something. This is about fully validating nodes. I don’t care if you prune the history or skip the line to catch-up with everyone else. Light nodes aren’t nodes. Edit: This is not about archival nodes. This is about about staying in sync, after the fact. When you’re done with this article you can read the follow-up one: This has become a 2-part article. _The differences between Light-Clients & Fully-Validating Nodes_medium.com Sharding centralizes Ethereum by selling you Scaling-In disguised as Scaling-Out Index Ethereum’s runaway data directory is just the tip. My Argument: size It will all work, until it doesn’t. My Prediction: Transpose. My Suggestion: My Argument: Larger blocks centralize validators. It’s that simple. It’s central argument in the entire cryptocurrency community in regards to scaling. Not many people familiar with blockchain protocol actually deny this. The following is an excerpt from what I consider to be a very well put together explanation of various “Layer 2” scaling options. the (Of which, the only working one is already implemented on Bitcoin.) https://medium.com/l4-media/making-sense-of-ethereums-layer-2-scaling-solutions-state-channels-plasma-and-truebit-22cb40dcc2f4 That article is written by . His company even announced a project that’s meant to mirror the Lightning Network on Ethereum. Josh Stark He gets it. (Which is oddly coincidental given Elizabeth Stark ’s company is helping build Lightning.) The problem? Putting everything about Proof of Stake completely to the side, the incentive structure of the base layer is completely broken because there is no cap on Ethereum’s blocksize, and even if one was put in place it would have to be , and then these Dapps wouldn’t even work because they’re barely working now with no cap. It doesn’t even matter what that cap is set at for this argument to hold because right now reasonable there is none in place. Let’s backtrack a bit. I’m going to briefly define a blockchain and upset people. Here is what a blockchain provides: An immutable & decentralized ledger. That’s it. Here is what a blockchain needs to keep those properties: A decentralized network with the following prerogatives: Distribute my ledger — Validate Append my ledger — Work Incentivise my needs — Token Here is what kills a blockchain: Any feature built into the blockchain that detracts from the network’s goals. A blockchain is just a tool for a network. It’s actually a tool that can only be used by a kind of network. So much so that they require each other to exist and fall apart when they don’t -operate, You can build of this network, but quite frankly anything else built the base layer (L1) that negatively affects the network’s ability to do its job is going to bring the entire network to its knees… very specific very specific co given enough time. on top into given enough time. Here’s an example of an L1 feature that doesn’t effect the network: Multisig. It does require the node to do a bit of extra work, but it’s “marginal”. The important thing to note is hardware is not the bottleneck for these networks, Something as simple as paying to a multi-signature address won’t tax the network any more than paying to a normal address does because you’re paying on a per-byte basis for every transaction. It’s a blockchain feature that doesn’t harm the network’s ability to continue doing its job because the data being sent over the network is (1)paid for per-byte, and (2) via the blocksize cap. , not “artificially capped”. The blocksize doesn’t restrict transaction flow, it regulates the amount of data being sent over the network. Herein lies the problem. (properly designed) network latency is. regulated Regulated broadcast-to-all When we talk about the “data directory” size, it’s a direct reference to the size of the entire chain of blocks from the original genesis block, but taking this at face value results in the standard responses: Disk space is cheap, also see Moore’s Law. You can prune the blockchain if you need to anyway. You don’t need to validate everything from the genesis block, the last X amount of blocks is enough to trust the state of the network. What these completely ignore is the data a node must process. per-second You can read my entire if you want, but I’ll excerpt the important part below. Over in they try and argue “you don’t need to run a node, only miners should decide what code is run”. It’s borderline absurd, but I won’t have to worry about that here because Proof of Stake completely removes miners and puts everything on the nodes. article about Moore’s Law Oz (They always were, but now there aren’t miners to divert the argument.) Moore’s Law is a measure of integrated circuit growth rates, which averages to 60% annually. It’s not a measure of the average available bandwidth . (which is more important) Bandwidth growth rates are slower. Check out . Starting with a 1:1 ratio , at 50% growth annually, 10 years of compound growth result’s in a ~1:2 ratio. This means bandwidth scales twice as slow in 10 years, 4 times slower in 20 years, 8 times in 40 years, and so on… Nielsen’s Law (no bottleneck between hardware and bandwidth) (It actually compounds much worse than this, but I’m keeping it simple and it still looks really bad.) Network scales slower than bandwidth. This means that as the average bandwidth speeds increase among nodes on the network, block & data propagation speeds do not scale at the same rate. latency Larger blocks demand better data propagation to counter node centralization. (latency) Strictly from an Ethereum perspective with a future network of just nodes after the switch to Proof of Stake, you’d generally want to ensure node centralization is not an issue. The bottleneck for Bitcoin’s network is its blocksize , because it ensures the growth rate of network demands never exceed the growth rate of external limitations like or . Because of Ethereum’s exponentially growing blocksize, the bottleneck is not regulated below these external factors and as such results in a shrinking and more centralized network due to network demands that exceed the average users hardware and bandwidth. (as it should be) (and in some cases indeterminable) computational performance network performance increasingly Bitcoin SPV clients aren’t nodes. They don’t propagate blocks or transactions around the network, they leech, and all that they leech are the block . headers Remember this because it’s going to get very important later in this article: You can put block and still create a transactions into a invalid block header. valid If the network is controlled by 10 -nodes, you only need half of them to ignore/approve invalid transactions so long as the header is valid. FULL This is why validating the transactions matter from a network perspective, and why you need a large decentralized network. It doesn’t matter from my grandmas perspective and that’s fine, but we aren’t talking about my grandma. We’re talking about ensuring the network of working and actively participating nodes grows, not shrinks. This node participating until it got cut off due to network demand growth: was https://www.reddit.com/r/ethereum/comments/58ectw/geth_super_fast/d908tik/ It’s not uncommon and it continues to happen: https://github.com/ethereum/go-ethereum/issues/14647 Notice how the solution is to “find a good peer” or “upgrade your hardware”? Good peers shouldn’t be the bottleneck. Hardware shouldn’t be either. When all of your peers are hosed up by so many others leeching from them , you create a network of masters and slaves that gradually trend towards only one master and all slaves. It’s definition of centralizing. Unregulated blocks centralize networks. Large blocks are only marginally better, but set a precedent for an ever increasing block size, which is equally as bad because it sets a precedent of increasing the size “in times of need”, which the results of unregulated blocksizes. This is why we won’t budge on the Bitcoin blocksize. (because the good peers are the ones doing the real work) (If you don’t agree with that statement you need to make a case for how this trend won’t subside in the future because currently that’s the direction this is going towards and it won’t stop unless a cap is put in place. If your answer is sharding, I address that fairy dust at the end.) the (but capped) mirrors I tweeted about it a few times but clearly I didn’t think that was enough. My Twitter reach doesn’t really extend much into the Ethereum space. That chart is symbolic and not representative of any actual numbers. It only serves to visually express the point I’m trying to make. To clarify, the green curve represents an aggregated average of the various demands of the Ethereum Network. At some point your node will fall out of sync because of this or a blocksize cap will be put in place. It could happen now, or it could happen in 10 years, or in 50 but your node will fall out of sync at some point at this rate. It will happen in Bitcoin. You can deny it now all you want, but this article will be here for when it happens, and when it does asinine Dapps like CryptoKitties, , , and whatever comes next will . This is exactly what happened to Ryan Charles’ service Yours.org that he originally built on Bitcoin. The only difference being Bitcoin already had the cap in place and Ryan either didn’t foresee this from a lack of understanding, or for some reason he expected the blocksize to keep getting raised. Instead of reassessing he doubled down on BCash, meanwhile took his concept and implemented the same exact thing on top of Bitcoin’s Lightning Network. never Shrimp Farm Pepe Farm cease to function Yalls.org My Prediction: Ethereum will implement a blocksize cap and it will race BCash to both of their deaths. ←No longer updating statistics, chart is edited & extrapolated using REAL current data. http://bc.daniel.net.nz/ https://ethereum.stackexchange.com/questions/143/what-are-the-ethereum-disk-space-needs The chart above isn’t even a prediction. This is me filling in the blanks on what the that compared both chains data directories, and then extrapolating from it. Here’s what we know: (in yellow) was last remaining graph Bitcoin’s future is predictable. The blockchain growth & network demands will always be linear. (Ideal) The amount of data an Ethereum node is required to process is through the roof and climbing. per second (Unideal) If Ethereum on-chain demand freezes where it is now, blockchain growth will continue the linear trend highlighted by that dotted line. (Very bad) If Ethereum on-chain demand continues to grow exponentially the amount of people complaining about their node going out of sync will reach a tipping point. (There’s only one option when this occurs.) That graph above? The owner stopped trying to maintain the node. Physical demands are an issue as well, like time constraints in your personal life. Servicing requirements need to be low, not high, not reasonable… low. Do you know what I do to service my Bitcoin/Lightning node? I leave my laptop on. . If I have to reboot I shut down the services, reboot, and start them back up again. Day to day I use my laptop for an assortment of other tasks, none of which inhibit its ability to run the node software. With all due respect if a change was implemented and forced on me that resulted in my node no longer being compatible with the network and unable to maintain a sync, I would flip out over the idiocy that allowed that, if I was a misinformed individual. Fortunately I’m not and I signed up for a blockchain with foresight . That’s it (Bitcoin) The problem? I don’t think most of the people running Ethereum nodes are informed enough to know what they signed up for. I don’t think they understand the fundamental incentive models, and I don’t think they fully realize where and why they break down with something as simple as not having a blocksize cap. Hopefully this article will succeed at teaching that. So what happens when that psychological tipping point is reached? Do people give up? How many nodes have to be lost for this to occur? The explorer websites aren’t even tracking this data anymore. Etherscan.io is no longer tracking full or fast sync directories, Etherchain.org says: Error: Not Found Etherscan also isn’t letting you zoom out on the memory pool, the queue of transactions waiting to be included into blocks. The reason fees go up is because this queue builds up. You be able to see this over time. Here’s one that tracks Bitcoin’s mempool, side by side with the Etherscan.io one: should /// https://jochen-hoenicke.de/queue/#1,4d https://etherscan.io/chart/pendingtx Both of these charts are monitoring the rough total pending transaction counts on these networks, and the scales are about the same, 4/5 days respectively. The difference? I can zoom out on the Bitcoin one and see the entire history. Why does this matter? Psychology matters when your network has no regulated upper boundaries. Here what ours looks like zoomed out: See what I mean? See how scale matters? What if I zoomed out on Ethereum’s mempool and saw that it was at the top of an ever growing mountain? I’m not saying that’s where it is today, but I am saying that this information needs to stop being obscured. I’m also saying that if/when it ever is unobscured, it’ll be too late and nothing can be done about it anyway. It’s already too late now. Let’s take a look at block and transaction delay on Bitcoin’s network. Below you’ll see two charts. The 1st one is how long it takes for a block to spread across the network, the 2nd is for a transaction. Transactions are processed by the nodes and held onto until a valid block is created by a miner and announced to the network. (all 115,000 of them) Block propagation times have dropped drastically because of very well designed improvements to the software. Transactions are validated when they come in and kept in the mempool. When a new block is received, it’s quickly cross referenced with all the transactions you already have stored, and very rarely includes many transactions you haven’t received yet. This allows your node to validate that block extremely fast and send it out to all your other peers. Transaction times on the other hand have slowly gone up but seem to be stabilizing. They’ve been “intentionally” allowed to go up as a result of privacy improvements in the software, but that’s a worthy tradeoff because blocks are 10 minutes apart on average anyway, so a delay of 16 seconds is acceptable. I’d imagine that once blocks are consistently full this growth will level off because transaction fees from the blocksize cap will self-regulate the incoming flow of transactions, assuming no other protocol changes are made. Keep in mind, none of this information is available for Ethereum: https://dsn.tm.kit.edu/bitcoin/ Bitcoin is designed with this in mind. The transaction count queue goes up but the blocks are regulated. People end up learning how to use this tool we call a blockchain the way over time and transaction flow stabilizes. With an unregulated tool you end up with a bunch of people chaotically trying to use that tool all at once for some random “feature” like CryptoKitties that ends up grinding the entire thing to a halt until the backlog is processed. All of the Ethereum full-nodes need to process every single one of these contracts. might not need to, and might tell you that you don’t need to, but does need to. So how many of are there? What do higher fees do? They deter stupid Dapps like CryptoKitties at the base layer. There is absolutely zero need for them, and larger more “functional ideas” will only experience the same thing but much worse because right You they someone them blockchains don’t scale. These Dapps are crippling your blockchain because it’s unregulated: But that was the promise though, right? That was the dream. That was the entire premise of the Ethereum blockchain: Bitcoin, but better. It’s not. Clearly unregulated blocks don’t result in infinite transactions, but the real takeaway here is the network can’t even handle the current amount, there just aren’t enough nodes capable of processing that information and relaying it in a timely fashion. Do you know how many Ethereum nodes there are? Do you know? The Bitcoin network has about 115,000 nodes, of which about 12,000 are listening-nodes. Almost all of them are participating nodes, What a listening-node is, compared to a non-listening, doesn’t matter here because they are all participating in sending and receiving blocks to and from the peers they are connected to. The default is 8, the client won’t even let you get more than 8 unless you add them manually. This was intentionally put in place, and it’s recommended you add more because it’s unhealthy for the network: physically really because that’s regulated too. don’t https://bitcoin.stackexchange.com/a/8140 Remember this from earlier? Find a good peer. That’s not how you fix things. This is a prime example of why a chain that allows participants the freedom to be selfish via lack of regulation is bad. This only has one outcome: Master & Slave nodes, where the limited masters serve all the slave nodes. Sounds decentralized, right? Especially when the financial requirements to be one of those master nodes keeps going up… To be fair, and as an aside: This is the exact criticism the Lightning Network gets, Blockchain networks are peer-to-peer . State-Channel networks like Lightning are peer-to-peer . The way information is being sent is completely different. Your refrigerator has enough hardware to be a Lightning node. Lightning “Hub & Spoke”criticisms are with channel balance volumes. Hub & Spoke is equivalent to the Master & Slave issues, but with channel balances there is no bottleneck on the . You just standardize the Lightning clients to open X amount of channels with X amount funds in each, then network forms around that standard, completely avoiding hubs or spokes, just like the Bitcoin clients standardize 8 peers. The Lightning Network is new so we don’t know what that standard should be yet because we have almost zero data we can measure. /endlightningdefense but it’s a completely different type of network. broadcast networks anycast networks data Speaking of zero data we can measure, why are these the only charts for Ethereum node counts? Where’s the history? How many of these nodes used fast/warp sync and never it all? You don’t need to store it all because you can prune, but again, How many are just light clients syncing only the block headers? fully validated how many are fully validated? https://www.ethernodes.org/network/1 It’s funny how propaganda sites like Trustnodes pushing BCash conspiracies publish pieces like the following one with bold-faced lies, then it gets circulated around and no one outside the flow of correct information questions it: I’m not linking to a BCash propaganda site. There are 115,000 Bitcoin nodes and they all fully validate: http://luke.dashjr.org/programs/bitcoin/files/charts/software.html So what do you do now? What do you do as an individual who slowly comes to this realization? What do you do as an individual who has no idea what’s going on? What happens to a network that is primarily made up of these individuals that slowly leave ? How many participating nodes are left? How many nodes hold a full copy of the original genesis block? What happens when 5 data centers are serving the entire network of slaves the chain? Who’s validating those transactions when ? You can sit there and repeat time and time again that “the network only needs the recent state history to be secure” all you want, but when your network is broken from the bottom up and most nodes can’t even keep up with the last 1,000 blocks, (not literally, but as a participating node downgrading to a light-node) (light-nodes) everyone is only syncing the block headers how is that secure in any way? The takeaway from all of this: Ethereum’s blocksize growth is bad because of node processing requirements, not how much they need to store on a hard-drive. To prevent complete collapse of the network, Ethereum will need to implement a reasonable blocksize cap. Implementing a blocksize cap will raise fees and in return prevent many Dapps from functioning, or severely slow down. Future Dapps won’t work. If Dapps don’t work, Ethereum’s entire proposition for existing is moot. Where does BCash fit in? BCash just increased their blocksize from 8MB to 32MB, and is adding new OP_CODES soon to allow “features” like ICOs and BCash Birdies . ™ BCash has “room to grow” coming from a completely understressed blockchain, while Ethereum is a completely overstressed blockchain. https://txhighway.cash Ethereum is dying and BCash is trying to be exactly like it while ignoring all the warning signs we’ve been trying to bring to everyones attention. They wanted bigger blocks and ICOs, they got it now. Both chains will become the same thing: Centrally controlled blockchains that will slowly die, but given temporary life support via gradual blocksize increases to continue supporting fraudulent utility tokens, until the entire system breaks down when no one can run a node. My Suggestion: Stop using centralized blockchains. This section has been extensively expanded on in the follow-up article . The diagrams have been completely redone. Reading that is a must after this. The only one in that room that runs a fully validating node is the one that’s simultaneously holding up the painting, and the Ethereum network. I’ve managed to make no mention of Vitalik this whole article so I can focus on the technicalities, but if this picture doesn’t represent the essence of the Ethereum space then I don’t know what does. I applaud Vitalik for calling out scammers like Fake Satoshi, yet at the same time he equally misrepresents the functionality claims of Ethereum. (or the original) Oh, and that golden goose egg you call sharding? It’s hocus pocus. Fairy dust. It’s the same node centralization issue with a veil thrown in front of it. It’s effectively force feeding you the Master & Slave network I just warned you about, under the disguise of “new scaling tech”. Forgetting Vitalik’s diagram he put out because it’s meaningless, let’s try to simplify Ethereum’s current network first. The diagram below essentially shows all the light-clients in pink and the “good” full-nodes in purple. Your fast/warp sync node may be purple now until it can’t or you give up on upgrading/maintaining and just use the light client feature, then it joins the pink group. As time moves forward, the pink nodes increase while the purple decrease. This is inevitable because it’s what everyone is doing. Do run a full-node or a light client? Do run anything at all? Switching to using the light client is consistently recommended “if syncing fails”. already you you That’s not a fix. Don’t worry though, Vitalik is here to save the day. He’s turning “nodes” into SPV clients that only sync the block : headers But what does that mean? Well, fortunately I wasted a lot of time writing and drawing this up too so I can explain it visually, but first let’s start with words: In Bitcoin you either fully validate, or you don’t. You’re either: A Full-Node fully validate transactions/blocks. and do everything. You all An SPV Client that is just tethered to a full-node, syncs just the block headers, They are not part of the network. They shouldn’t even be mentioned here but I’m doing it to avoid confusion. does nothing, and shares nothing. Again, there are 115,000 Bitcoin full-nodes that do everything. https://twitter.com/StopAndDecrypt/status/1002666662590631942 You can either read about this in more depth in Part 2, or you can take a look at the standalone article below: _Bitcoin is an impenetrable fortress of validation._hackernoon.com Bitcoin Miners Beware: Invalid Blocks Need Not Apply In Ethereum there are: Full-Nodes They fully validate transactions/blocks. that do everything. all Nodes that try to do everything but can’t sync up because of peer issues so they skip the line and use warp/fast sync, and then “fully”-validate new transactions/blocks. Light-“nodes” that are permanently syncing and I guess they are sharing the headers with other similar nodes, so let’s call these “SPV Nodes”. They don’t exist in Bitcoin, again SPV clients in Bitcoin don’t propagate information around, they aren’t nodes. just the block headers, That Ethereum node count? Guarantee you those are mostly Light-Nodes doing absolutely zero validation work Don’t agree with that? Prove me wrong. Show me data. They are effectively operating a secondary network of just sharing the block headers, but fraudulently being included in the network node count. (checking headers isn’t validation). They don’t benefit the main network at all and just leech. In -Ethereum (2.0) with Sharding, things change a bit. I’ve went ahead and edited out this section because I wrote an entire second article on it that does a much better job at explaining this, and the differences between Bitcoin and old and new Ethereum (2.0): New _The differences between Light-Clients & Fully-Validating Nodes_hackernoon.com Sharding centralizes Ethereum by selling you Scaling-In disguised as Scaling-Out This isn’t scaling. When your node can’t stay in sync it downgrades to a light client. Now with sharding it can downgrade to a “shard node” . None of this matters. You’re still losing a full-node every time one downgrades. What’s even worse is they are calling all the pink dots nodes even though they are only syncing the headers and trusting the purple nodes to validate. How would you even know how many fully validating nodes there are in this set up? You can’t even tell now because the only sites tracking it count the light clients in the total. How would you ever know that the full-nodes centralized to let’s say, 10 datacenters? You’ll never know. You. Will. Never. Know. On the other hand, Bitcoin is built from the ground up to prevent this: https://twitter.com/_Kevin_Pham/status/999152930698625024 So what are you going to do? What you do? should Take everything you’ve learned and start developing applications on top of a good blockchain. One that isn’t broken. Are you a developer? Start focusing on readying your services to support payment networks. Ones that are built on top of a good blockchain. Are you a merchant? Take everything you’ve invested and start investing in a good blockchain. One that isn’t going to die in the coming years. Are you an investor? Buy EOS. It’s newer, just as shitty for all the same reasons I mentioned above, just no one knows it yet. Are you a gambler? This is definitely not the chain for you. Find one that is. Are you an idealist? https://twitter.com/StopAndDecrypt/status/992766974022340608 Part 2 _The differences between Light-Clients & Fully-Validating Nodes_medium.com Sharding centralizes Ethereum by selling you Scaling-In disguised as Scaling-Out If you’re interested in running a Bitcoin node that will never go out of sync or demand that you update your hardware, check out this tutorial I put together: _How to compile a Bitcoin Full Node on a fresh installation of Kubuntu 18.04 without any Linux experience whatsoever._hackernoon.com A complete beginners guide to installing a Bitcoin Full Node on Linux (2018 Edition) _The latest Tweets from 🅂🅃🄾🄿 (@StopAndDecrypt). Fullstack Social Engineer: 10% FUD, 20% memes, 15% concentrated…_twitter.com 🅂🅃🄾🄿 (@StopAndDecrypt) | Twitter