Making A Case for OffChain Storage in Hyperledger Fabric [Deep Dive]

In this article, I’ll try to explain the significance of OffChain Storage in Hyperledger Fabric and also the offchaindata application that I have built to demonstrate the offchain storage implementation using the Go Programming language for Hyperledger Fabric.

OnChain and OffChain Transaction

The transaction flow in any blockchain platform is performed in two different layers. The transactions committed to the distributed ledger in the blockchain network is considered to be OnChain transaction and the transactions that are executed outside of blockchain and stored into any collection database such as CouchDB, StateDB are considered to be OffChain.

Blockchain more than just a Storage solution

The concept of blockchain technology is to store a large set of data with the capability of providing the current state for each transaction. The blockchain network will maintain a transaction history log for any changes performed in the distributed ledger data. This kind of technique separates Blockchain from traditional database storage technologies that are only designed to store the data in an organized manner.

Problem with OnChain transaction

In general OnChain transaction takes a longer time to execute. The performance started to lag and takes much longer for the large queue of transactions that are waiting to be executed in the network. There is an extensive cost involved in terms of both business and storage space during OnChain transactions in Blockchain.

OnChain Storage Calculation

IBM has performed an analysis to determine the extensive costs involved in terms of Storage space.

Source: https://www.ibm.com/downloads/cas/RXOVXAPM

Facts

Bitcoin stores 1400 transactions per block.

Hyperledger blocks are 1 megabyte in size and have 1000 transactions per block.

Each blockchain transaction is of 5 KB of the size that creates 205 TPS (transaction per second)

Calculation of storage per TPS

The transaction calculation is performed by comparing a company average working 8 hours per day and 240 days per year.

(1 TPS/1000 TB)*1024 KB*3500 sec/hr*8 hr/day*240 days/year =

7,077,888 KB of data per transaction per year =

6,912 MB = 6.75 GB = .00659 TB/transaction/yr

OnChain Financial costs for Blockchain

IBM also provided the average enterprise-level costs for Hyperledger and Non-Permissioned blockchain such as Ethereum.

IBM Hyperledger costs $1000 per month plus an additional $1000 per active node, so in total cost $6000 per month.

The cost per transaction is Bitcoin is $1.30 and for Ethereum $0.25 per transaction.

In Non-Permissioned blockchain, the per-transaction cost will change based on the current value of cryptocurrency.

The cost for permission-based blockchain such as Hyperledger will vary by increasing of the number of nodes.

So, in comparison with transaction costs, All the non-transaction data such as Pictures, Videos, PDF and other documents should not be stored in the Blockchain ledger.

Solution with OffChain transaction

The OffChain transaction does not store a transaction for every node in the storage space. The peer who is willing to store specific transactions can use offchain storage. OffChain increases computational efficiency that the computation is executed in offchain is deterministic not consensus.

Design Implementation for OffChain Storage

There are many offchain databases available to integrate with Hyperledger Fabric to store transaction details. The offchaindata application that I have built is using CouchDB as offchain storage. There will be a GRPC event listener running that listens to the peers as GRPC client connection. So, the event listener process each block’s KVWriteSet value into the OffChain storage(CouchDB). The MapReduce technique is used to query offchain data from the CouchDB storage.

Github: https://github.com/Deeptiman/offchaindata

What is MapReduce?

MapReduce is a programming model that is designed to process a large set of data parallelly on a large cluster.

MapReduce has two function

1. Map — It provides the key-value pair list for a certain collection of documents.Reduce — It has a smaller set of key-value pairs that process multiple nodes in a collection.

2. CouchDB uses the MapReduce technique to filter all collection documents. In the following example, we will see how MapReduce works for a User model.

CouchDB uses the MapReduce technique to filter all collection documents. In the following example, we will see how MapReduce works for a User model.

User Model

type SampleUser struct {
		Email 		string 		`json:"email"`	
		Name 	  	string 		`json:"name"`
		Age	 	string		`json:"age"`
		Country 	string		`json:"country"`
}

Collection Documents in CouchDB

So, we will create a MapReduce function to query Emails from the collection.

Configure MapReduce for Email

curl -X PUT http://127.0.0.1:5990/offchaindb/_design/emailviewdesign/ -d '{"views":{"emailview":{"map":"function(doc) { emit(doc.email,1);}", "reduce":"function (keys, values, combine) {return sum(values)}"}}}' -H 'Content-Type:application/json'

Output

{"ok": true, "id":"_design/emailviewdesign", "rev": "1-f34147f686003ff5c7da5a5e7e2759b8"}

Parameters used

Design View: emailviewdesign

MapReduce Views :

{
			"views": {
				"emailview": {
					"map": "function(doc) { emit(doc.email,1);}",
					"reduce": "function (keys, values, combine) {return sum(values)}"
				}
			}
}

Query Reduce function to count total email

curl -X GET http://127.0.0.1:5990/offchaindb/_design/emailviewdesign/_view/emailview?reduce=true

Output

{"rows":[
			{"key":null,"value":7}
]}

Query Map function to list all emails

curl -X GET http://127.0.0.1:5990/offchaindb/_design/emailviewdesign/_view/emailview?group=true

Output

{"rows":[
			{"key":"[email protected]","value":1},
			{"key":"[email protected]","value":1},
			{"key":"[email protected]","value":1},
			{"key":"[email protected]","value":1},
			{"key":"[email protected]","value":1},
			{"key":"[email protected]","value":1},
			{"key":"[email protected]","value":1}
		]}

So, in this way MapReduce works. We can also create a MapReduce function for other nodes to query from the CouchDB.

Conclusion

All the queries performed in the offchain storage space and completely ignoring the onchain ledger. This increases computational efficiency in terms of querying a large set of data.

There is no transaction cost involved in performing an offchain query as a similar onchain query has higher transaction costs. In the case of Public Blockchain, OffChain storage can also be used to store sensitive private data as not all the participants in the blockchain network are aware of the additional separate layer of storage is used to store the data.

So, this is the overview of understanding the significant use-case of offchain storage in any blockchain network. Please check the offchaindata application at Github and share your feedback

I hope you find this article useful :)

Thanks