Data access has played a prominent role in any technology trend in the history of software. Data access technologies such as databases, search engines or query APIs are so ubiquitous that we barely think about them when architecting software solutions. As Web 3.0, decentralized applications powered by blockchain technologies evolve, infrastructure blocks such as data access will become more relevant. However, solving data access in the blockchain have proven to be a very challenging endeavor that forces developers to spend significant amounts of time writing infrastructure code. Among, the Web3 data access solutions in the market, The Graph Protocol is that one that I particularly like because of its simplicity and clever utilization of modern technologies.
We came across The Graph a few months ago during one of our blockchain implementations and have been testing and tracking the project ever since. The main idea behind The Graph is to make blockchain data queryable by leveraging established data access protocols such as GraphQL. While that idea sounds conceptually trivial, the implementation is full of non-trivial challenges.
In the last 5 decades of the software industry, every technology trends have seen an improvement in data access technologies as they have been able to build on the infrastructure created by previous trends. From file systems to the recent big data movement, the production cycles of data access technologies became shorter and the capabilities increasingly more sophisticated. That evolutionary picture completely changed with the advent of the blockchain technologies because, differently from other technology movements, the blockchain space reimagines data access starting from the storage and network protocol levels. From that perspective, most of the data access technologies and best practices from previous technology movements result impractical when apply to blockchains. Plain and simple, the Web 3.0 deserves a Web 3.0 data access protocol.
What makes blockchain data access so challenging? In my opinion, this challenge has its roots in three fundamental causes:
· Decentralization: Data in blockchain lives in a decentralized network of nodes that are constantly replicating records among themselves. From the data access perspective, this model is far more complicated than centralized database infrastructures.
· Opacity: Data in the blockchain is subjected to different levels of encryption and obfuscation which makes it very difficult to interpret. Obviously, the whole point of a query protocol is to know what attributes to query for and that information is not easily accessible in blockchain stacks.
· Sequential Data Storage: Data in blockchains in captured in transactions stored in a sequential group of blocks. That block-transaction data structure offers very poor navigation capabilities which is what is needed to enable a solid data query protocol.
The counter-interpretation of the three aforementioned challenges indicates that a robust Web 3.0 data access stack should have three main capabilities:
i. Ability to access information as if it was stored in a centralized repository.
ii. Ability to query records based on its attributes.
iii. Ability to efficiently navigate the blockchain data based on a specific criteria.
Some of these challenges have been solved by isolated technologies that we haven’t had a consistent stack to put them all together.
Conceptually, The Graph is a decentralized protocol for indexing and querying blockchain data. The Graph starts by creating a manifest that describes the representation of the blockchain data. The manifest can specify the attributes for a specific protocol of DApp. Once the manifest is created, The Graph captures the on-chain events from that specific protocol or application and indexes them into IPFS using the manifest as a guideline. Finally, the data is exposed by APIs based on the popular GraphQL protocol. The Graph endpoint will translate the GraphQL queries into IPFS commands used to access the data.
From an architecture standpoint, The Graph is based on the following components:
Skipping a few steps for the sake of simplicity, let’s illustrate how The Graph protocol works. The first step is to define a subgraph manifest using a YAML syntax as the following:
specVersion: 0.0.1schema:file: ./schema.graphqldataSources:- kind: ethereum/contractname: MyERC721Contractsource:address: "0x06012c8cf97BEaD5deAe237070F9587f8E7A266d"abi: ERC721mapping:kind: ethereum/eventsapiVersion: 0.0.1language: wasm/assemblyscriptentities:- Tokenabis:- name: ERC721file: ./abis/ERC721ABI.jsoneventHandlers:- event: Transfer(address,address,uint256)handler: handleTransferfile: ./mapping.ts
As you can see from the definition, the previous subgraph captures the data in the Transfer event of a specific smart contract(MyERC721Contract). The specific data to be captured is defined by the Contract ABI JSON file:
[{"anonymous": false,"inputs": [{"indexed": true,"name": "_from","type": "address"},{"indexed": true,"name": "_to","type": "address"},{"indexed": true,"name": "_tokenId","type": "uint256"}],"name": "Transfer","type": "event"}]
The next step is write a mapping function that transform the data from the Ethereum blockchain based on the specific subgraph:
import { Transfer } from './types/abis/SomeContract'
// This is an example of an entity type generated from a// subgraph's GraphQL schemaimport { Token } from './types/schema'
export function handleTransfer(event: Transfer): void {let tokenID = event.params.tokenID.toHex()let token = new Token(tokenID)token.currentOwner = event.params.to
token.save()
}
At this point, the subgraph can be deployed and we can start listening from events from the Ethereum blockchain. The current version of The Graph supports can collect Ethereum data using Infura, hosting a local Ethereum node or Ganache. The following code starts up a Graph node listening to data via the Infura API:
cargo run -p graph-node --release -- \--postgres-url postgresql://<USERNAME><:PASSWORD>@localhost:5432/<POSTGRES_DB_NAME> \--ethereum-rpc <ETHEREUM_NETWORK_NAME>:https://mainnet.infura.io \--ipfs 127.0.0.1:5001 \--debug
Finally, we can query data using a typicaly GraphQL syntax:
{tokens {idcurrentOwner}}
Developers have already deployed subgraphs for popular protocols and DApps using The Graph. You can see many of them using the Graph Explorer which provides a slick user interface to execute GraphQL queries against specific smart contracts or DApps.
The Graph is a very good iteration to address one of the most important challenges of Web 3.0 applications. By leveraging established technologies such as IPFS, Postgress or GraphQL, The Graph is lowering the entry point for developers querying blockchain data. To make things even more exciting, the current version of The Graph was recently open sourced and is being actively version. Although still in very early stages, The Graph seems to have the technological foundation to become one of the most important protocols of the Web 3.0 movement.