How do dApps store data?

Written by shrutiavinodh | Published 2018/08/11
Tech Story Tags: blockchain | dapps | data-storage | ipfs | ethereum

TLDRvia the TL;DR App

Photo by NASA on Unsplash

One of the earliest questions I had while trying out blockchain development was, “Where is everything stored?”

I was following this lovely tutorial by Mahesh Murthy and was wondering how everything worked in production.

For example, in order to deploy on the common Ethereum testnet ROPSTEN I had to first download an entire copy of the blockchain and sync with the peer nodes. That makes sense since that is what makes blockchain special. There are multiple copies of the entire blockchain on different nodes so there can be a consensus and all transactions can be verified.

However, with blockchain sizes ranging in gigabytes, this doesn’t really seem feasible. And looking at popular crypto collector mobile apps whose sizes are in the range of a couple of MB, that is clearly not what is happening.

And how does storing large files like videos even work, if all the nodes have to be a complete copy.

So, what is going on?

A Peer to Peer File system

IPFS is a great example of a peer to peer file system. Instead of a centralized server, IPFS allows files to be stored in a peer to peer fashion, much like Bittorrent. Every file is addressed by hashing its content. The hashes are identifiable since they all start with the letters Qm. Given the hash you will be able to retrieve the file.

How does this help with our storage issue?

Simple. Store only the content hashes on the blockchain. So, while the blockchain stores only the hashes, the hashes themselves provide an easy way to retrieve the large files being stored.

The folks over at Coral Health have provided a great write up on this.

Decentralized storage (Cloud solutions)

This is the approach Ethereum Swarm seems to go for. From what I understand, Swarm seems also to operate in a fashion similar to Bittorrent, in that it offers peer to peer storage for dApps.

From its orange paper,

Swarm is a peer-to-peer network of nodes providing distributed digital services by contributing resources (storage, message forwarding, payment processing) to each other. These contributions are accurately accounted for on a peer to peer basis, allowing nodes to trade resource for resource, but offering monetary compensation to nodes consuming less than they serve.

They aim at replacing the World Wide Web with its centralized servers with a decentralized version. In the same way DNS is used to lookup web pages, Ethereum Swarm uses a Smart Contract called ENS which would allow domain owners to register a reference to their content, even though said content will not be stored in a traditional centralized server.

It honestly looks like a fascinating way to revolutionize the web, but I digress. The documentation here makes for a fascinating read though.

But hey, don’t both of these sound similar?

Sure they do. So much so that the main author of swarm did a write up on this very fact. Apart from different peer management protocols and different underlying philosophies (IPFS wanting to try and integrate all existing protocols, and Swarm wanting to be used along with Smart Conract and Whisper to provide a truly decentralized web) one key difference seems to be that you can use Swarm as a cloud hosting storage. You can start a node, upload, and go offline (They call this upload and disappear.) Your data will still reside on the swarm. In IPFS you can only publish content on your hard drive.

Wait, what about distributed NoSQL databases like MongoDB/RethinkDB?

Good question. Honestly, I’m not sure. In databases like MongoDB all the nodes are equal and trust each other. The problem with that is one malicious node can lead to false information. This is related to a concept called Byzantine Fault Tolerance.

Byzantine Fault Tolerance is absolutely crucial to blockchains. In a typical distributed network, even a few malicious nodes can lead to destroying the reliability of the whole system. Byzantine Fault Tolerance is how the blockchain will continue even with Byzantine faults, that is, malicious nodes/ nodes which spread false information. How this is being countered is really interesting but that is for another day.

BigChainDB and TiesDB provide alternate ways for dApps to store the data, but I’ll go into more detail on those in another write up.


Published by HackerNoon on 2018/08/11