Data and content management are two of the main capabilities in many of the real-world business applications, such as information portals, Wikipedia, and ecommerce and social media applications.
There is no exception in the decentralized world. During the EVM discussion, we briefly looked at the EVM capability for storing data on Ethereum.
Although it is convenient, it is not generally intended to be used for data storage. It is very expensive too. There are a few options application developers can leverage to manage and access decentralized data and contents for decentralized applications, including Swarm (the Ethereum blockchain solution), IPFS and BigchainDB (a big data platform for blockchain). We will cover them in the rest of this section.
Swarm
Swarm provides a content distribution service for Ethereum and DApps. Here are some features of Swarm:
· It is a decentralized storage platform, a native base layer service of the Ethereum web 3 stack.
· It intends to be a decentralized store of Ethereum's public record as an alternative to an Ethereum on-chain storage solution.
· It allows DApps to store and distribute code, data, and contents, without jamming all the information on the blockchain.
Imagine you are developing a blockchain-based medical record system, you want to keep track when the medical records are added, where the medical records are recorded, and who has accessed the medical records and for what purpose. All these are the immutable transaction records you want to maintain in the blockchain. But, the medical records themselves, including physician notes, medical diagnosis, and imaging, and so on, may not be suitable to be stored in the Ethereum blockchain. Swarm or IPFS are best suited for such use cases.
DApps can create, manage, and store data and content directly into a decentralized file system, like IPFS and Swarm, and access and retrieve the data and content using a Swarm hash. When DApps submit all transactions to the Ethereum network, the transactions can reference the Swarm resources with the referenced Swarm hash.
Internally, Swarm maintains a specific type of content addressed distributed hash table (DHT) across the decentralized nodes. File or content uploaded into the Swarm network is treated as the blobs, and chopped into different chunks. A Merkle tree is then created out of all those chunks, and is used to ensure the content integrity. Trunks are further distributed to participating nodes and stored into the DHT. When an access request is made, the content is served by the node(s) closest to the address of a chunk.
Swarm offers several APIs for accessing and managing the contents, including a CLI (command-line interface) and JSON-RPC APIs. JavaScript packages are available through the erebos, swarm-js or swarmgw packages, which can be leveraged by most of the UI/JavaScript- based DApps.
IPFS
IPFS is similar to Swarm; it is a peer-to-peer distributed filesystem that was designed to store and share the content across a decentralized network. Both IPFS and Swarm offer the decentralized data and content storage with content addressable hash, generated directly from the content. Both are used to store any kind of content, which can be referenced from the transactions in the Blockchain network.
Behind the scenes, there are quite a few technical differences; mainly, in terms how each chop large datasets into chunks and store them in a distributed network. IPFS may be thought of as a single BitTorrent swarm, exchanging objects within one Git repository. Swarm may be seen as more integrated with the Ethereum blockchain, and has an incentivized system for content sharing. However, Filecoin can be an overlay on top of IPFS for providing a similar incentivized system.
The DApp application architecture in the Swarm section applies to IPFS too. In the same way, IPFS offers several APIs for accessing and managing the contents, including a CLI interface, JSON-RPC APIs, and an HTTP interface. JavaScript packages and Go library are available too, which can be leveraged by most of the UI/JavaScript or Go-based DApps.
BigchainDB
BigchainDB is a decentralized database combining both traditional database and data management capabilities and blockchain features. As a blockchain database, BigchainDB is complementary to other decentralized systems, such as decentralized file storage like IPFS or Swarm, and smart contract blockchains like Ethereum or EOS. It is another alternative for storing decentralized data and content. It can be used as the data storage for traditional applications, or can be leveraged as the decentralized data storage for decentralized blockchain platforms, like Ethereum. Although it can be used as a file repository, it is not recommended since it is best suited to structured or unstructured data.
Within the Ethereum community, there is a lot of interest in integrating BigchainDB with Ethereum smart contracts. Some EIPs and POCs (prototype of concept) were proposed to explore such integration options. One of the PoCs is to leverage the Oraclize service to retrieve data from BigchainDB within a smart contract. On successful retrieval of data, the smart contract evaluates and executes the logic and performs the requested operation. There are two ways a DApp can integrate with BigChainDB. One is to directly interact with BigchainDB as the decentralized data storage through HTTP GET and POST. The second option is to leverage Oraclize service in smart contracts to access external data from BigChainDB.
This process uses the following rules:
· BigchainDB offers several interfaces for connecting to BigchainDB servers and storing and retrieving data from the blockchain database, including a CLI interface and HTTP APIs.
· When storing data in the database, you will need to use HTTP POST to send the data to the database server. You use the HTTP GET interface to retrieve data from the database.
· BigchainDB also provides database drivers for developers to connect to the network servers from high-level programming languages, like Java, Python. and JavaScript/Node.js.
In summary, in this article we reviewed top 3 choices for content storage platforms for managing decentralized data on a blockchain. The next step after picking and using one of above data storage platform is to use data visualization to monitor the network platform. One way to achieve this is to use a virtual data room, which can display all the beneficial information in one place. It allows you to assess the network data and see whether your list of collected data is trustworthy.
Feel free to read my follow up article “Decentralized messaging with Whisper for Ethereum blockchain development “ to learn more about how to use Whisper messaging in your next Ethereum blockchain development project.
If you like to explore blockchain development with an alternative platform like Hyperledger or learn about the projects of Hyperledger like Sawtooth or Iroha, visit Comprehensive Hyperledger Training Tutorials page to get the outline of our Hyperledger articles.
About Authors
This article is written by Matt Zand (founder of High School Technology Services and Coding Bootcamps) in collaboration with Brian Wu who is a senior Blockchain advisor at DC Web Makers.