The current Internet architecture takes hold of several small entities in a confined structure referred to as centralization. Ever since the advent of the Internet, sharing of information through the Internet depended on the centralization of resources in a database such that all the peers in the interconnected network are centrally controlled.
With decentralized systems, there is no reliance on intermediaries to facilitate connections. Instead, in decentralized systems, individual nodes in the network take control of their own data and at the same time individually share information directly with other nodes available in the network.
Today the centralization of the Internet implies only a few companies taking control of the Internet through data privacy as well as dictating how information flows. Specifically, the Internet is controlled by a group of companies called FAANGs which stands for Facebook, Apple, Amazon, Netflix, and Google.
Owing to the billions of data shared through such platforms, the database controlled by these companies contains valuable information that drives marketing as well as other credible actors.
The wealth of information contained in the centralized Internet can have far-reaching positive ramifications for researchers as well as advertisers depending on that application of analytics techniques.
Security concerns and data privacy issues – from shady mismanagement of such personal data and malicious actors to spammy remarketing advertisement – have inspired a new movement of tech entrepreneurs, Internet activists, and influential technologists to evangelize solutions which will build decentralized systems and organizations.
The use of centralized servers by many organizations means that there will be significant changes in the manner in which information is stored and shared online and stored.
With centralized systems, computers act as nodes in the network playing the role of power contribution as well as the control of distributed storage systems.
Meanwhile, distributed systems have equally been subject to security and privacy concerns owing to the interconnections that exist between the various databases involved in the network.
With decentralized systems, data is stored in a privately-owned database such as cloud-hosted accounts. In such a system, there can never be hacking threats or fears of external control of the information contained in the storage systems.
With the advent of blockchain technology, distributed systems are likely to adopt the mechanisms of a shared ledger in sharing and distributing information thus guaranteeing the privacy of user information.
The distributed Internet of Things has recently advanced to accommodate smart systems that have made it easy to identify solutions for problems in agriculture, the industrial sector as well as in domestic issues. The future of the technology is expected to encompass technologies such as sensor devices, mobile technologies and RFID systems (radio frequency identification).
The Exploding Billion Dollar Computing Power Storage Market
Cloud computing in 2020 is shaping and expected to become more focused on vertical and a sales ground war as the leading vendors battle for market share according to Larry Dignan
Editor-in-Chief, ZDNet. He predicted four industry-wide trends:
Multi-Cloud: Companies are well aware of vendor lock-in and want to abstract their applications so that they could be moved across clouds.
Data Acquisition: The more corporate data that resides in a cloud the more sticky the customer is to the vendor.
AI & IoT: To act as differentiators among the top cloud service providers
Sales Tactics: Intensity and uncertainty.
The major market players of the global cloud computing market are Amazon.com Inc., Microsoft Corporation, Alphabet Inc, Oracle Corporation, Cisco Systems, Inc., Salesforce.com, Inc., SAP SE, VMware, Inc., IBM Corporation, Alibaba Group Holding Ltd., Rackspace Inc., Adobe Systems Inc., SAS Institute Inc, Dell EMC Corp.and TIBCO Software Inc.
Is it economical and tech viable to use blockchain for data storage?
The existence of blockchain has complemented the storage of data whereby huge data can be stored in decentralized databases. As a result, the data will be secure since no people have the capability to control such data.
Extending this concept could be used by decentralized data ownership – in a format, users are remunerated for the value of their data when they decide to grant access to third parties.
In opposition to traditional cloud servers, the decentralized cloud storage does not keep data on one particular centralized server, but it uses different nodes located across the world, which are independent of each other. The nodes are hosted by different providers, not centralized under one entity.
This technology has started with the BitTorrent protocol, which was designed for peer-to-peer file sharing. The InterPlanetary File System (IPFS) protocol has been one of the biggest evolutions of decentralized data storage. IPFS is an open-source project created by Protocol Labs, an R&D lab for network protocols and former Y Combinator startup.
There are two fundamental approaches when it comes to decentralized data computing. The off-chain and on-chain. The on-chain refers to the design which includes all data of users is stored within each block on the blockchain.
The enhanced security feeling comes at a price to maintain full nodes, which is a far more expensive option. While traditional cryptocurrencies such as BTC with a maximum size of 1MB per block and capacities of 3-4 transactions per second (TPS) can’t serve for the reason of data storage.
Simply put, If users were able to upload a few megabytes of data, the network will become shortly overloaded. Moreover, this will cost a fortune in network fees and kill decentralization due to the heavy investments that will be required to run and purchase those machines.
As off-chain data we refer to any non-transactional data that is too large to be stored in a blockchain efficiently, or requiring the ability to be changed or deleted.
There are several key issues with using existing data stores with new blockchain-based projects. As those categorized by the IBM report titled “Why new off-chain storage is needed for blockchains“ there are several issues with existing data stores with the existing blockchain-based projects.
The report is dated back to 2018, an early era for public blockchains- at the time of the writing would make no sense to store on-chain data.
As mentioned, a relatively technically easy solution to avoid the costs of the blockchain is to store hashes instead of data per se on the blockchain. But off-chain data storage comes with its own set of flaws.
That’s why until today the great majority of data storage solutions on the market are off-chain, by utilising Blockchain Layer 2 (Off-Chain) Infrastructure Platforms.
Off-chain solutions tend to have a weak spot, which is no other than safety. By exporting data outside the blockchain, we weaken our safety standards due to more intermediates being involved.
Different approaches have been executed by the blockchain community to understand the prospect of storing data exclusively on-chain, answering two contradicting aspects: scalability & increased block size.
ILCoin & Rift Protocol
Among the scalability enables is the RIFT Protocol. ILCoin’s Decentralized Cloud Blockchain (DCB) and RIFT allow on-chain storage of files in unlimited volumes. Diving deeper in their architecture we extract the following outcomes:
Bigger blocks enabling more storage. ILCoin has released a 5Gb block (block # 310280) – the biggest stable block in the market –
Replication of files by a second layer of Mini-blocks which are processed independently of the mined blocks
Mini-blocks are not mined but have a reference to the transactions, in the same way as the mined blocks have
Security of the DCB is ensured by the Command Chain Protocol (C2P) Mini-Blocks that are the same as the traditional blocks except they are not mined. They are replicated in the same way as fractals.
A Mini-Block hash is generated automatically by the code which eliminates the need for them to be mined. They are contained as a layer inside the traditional (parent) block by way of a reference system.
RIFT makes possible the processing of a huge amount of transactions through the Mini-Blocks which expands the capacity of the parent blocks.
The transactional speed of RIFT Protocol is tens of times faster than Visa, and tens of thousands of times faster than the Bitcoin network.The new protocol operates 5Gb stable blocks, while increasing transaction speed to 23 140 987 tx/block in the case of 3-minute block generation time and 232 bytes transaction weight.
Since the end of November, 5Gb blocks are live and can be checked out in Block Explorer.
BitTorrent File System is both a protocol and network implementation that provides a content-addressable peer to peer mechanism for storing digital content in a decentralized file system.
It’s paired with Tron Ecosystem and the BitTorrent network. BTFS takes advantage of a fork of the IPFS implementation as a start. IPFS was built upon a collection of state of the art technologies such as BitTorrent, distributed hash tables (DHT) etc.
The BTFS network architecture comprises several micro-services to serve Renters (who consume BTFS network’s storage by paying BTT) and Hosts (who provide storage space to BTFS network and get BTT rewards).
The following micro-services run in the BTFS network.
Status Server
The status server stores network system metrics used to improve the BTFS network. This data powers the functionality of the other microservices.
BTFS Hub
The BTFS Hub works to provide storage renters with the most reliable hosts on the BTFS network. The Hub achieves this by calculating a score for every BTFS host using vital metrics like available storage space, host uptime & age, and proximity to file storage renter, among other metrics. The Hub powers the host recommendation based on renter selected preferences and needs.
BTFS Guard
One of the core guarantees of any large scale file storage system is the availability of stored files. When a file is stored in the BTFS network, the file undergoes Reed-Solomon encoding and is then split into 30 shards, stored on 30 selected hosts. With BTFS Guard, hosts are regularly challenged with proof of storage to ensure file integrity and availability. Should the threshold of missing shards be reached due to hosts not being available on the network, BTFS Guard pro-actively performs the file repair process to ensure file integrity for all renters.
Escrow
The BTFS Escrow service ensures the safe and secure transaction of funds between renters and hosts per the storage contract agreements.
BTFS is currently in BETA mode, and developers need to reach out to the Tron Foundation to request access.
Swarm is a distributed storage platform and content distribution service, a native base layer service of the ethereum web3 stack that aims to provide a decentralized and redundant store for dapp code, user data, blockchain and state data.
Swarm sets out to provide various base layer services for web3, including node-to-node messaging, media streaming, decentralised database services and scalable state-channel infrastructure for decentralised service economies.
From the end user's perspective, Swarm is not that different from the world wide web. In the background, the difference is that content is hosted on a peer-to-peer storage network instead of individual servers.
This peer-to-peer network is self-sustaining due to a built-in incentive system which uses peer-to-peer accounting and allows trading resources for payment. Swarm is designed to deeply integrate with the devp2p multiprotocol network layer of Ethereum as well as with the Ethereum blockchain for domain name resolution, service payments and content availability insurance.
But first, let’s look into Swarm’s data structure. There are three main components that make up the Swarm decentralised storage system:
Chunks: These are pieces of data of limited size (max 4K) that act as the basic unit of storage and retrieval in Swarm. Chunks link to addresses.
Reference: This is a unique identifier of a file that allows clients to retrieve and access the content.
Manifest: This is a data structure describing file collections. It specifies paths and corresponding content hashes allowing for URL-based content retrieval.
From the end user’s perspective, Swarm does not affect navigation or behaviour.
However in the back-end, a peer-to-peer storage network hosts content, instead of individual servers. This peer-to-peer network is self-sustaining due to a built-in incentive system. Incentives are only possible due to the use of a public blockchain that allows trading resources for payment.
Swarm also deeply integrates with the DevP2P multi-protocol network layer of Ethereum. DevP2P is a set of network protocols which essentially form the Ethereum peer-to-peer network.
Adding to the above, Swarm links to the Ethereum blockchain for domain name resolution (ENS), service payments, and content availability insurance.
In terms of blockchain developers the key perspectives to mitigate to on-chain data storage solutions could be summarized to the following:
The tectonic shift towards decentralized storage solutions is much closer than many Tech Gurus, Venture Capitalists, or even Blockchain Evangelists currently believe.
Expecting 30 billion interconnected devices by 2020 and mature companies building decentralised applications to distribute and store those, it’s for granted that decentralized data storage solutions will play a critical role and challenge the thrones of Amazon & Microsoft in the computing storage market.