Follow Following Unfollow Mano Rao
Sign in to follow this author
The blockchain tsunami is upon us! What started in the rebellious and esoteric world of cryptocurrencies in 2009 has now permeated the walls and halls of mainstream enterprises. The question being asked these days is not whether enterprises should adopt blockchain, but when and how to do it? And what should the role of enterprise IT be in helping an enterprise leverage blockchain? This article explores these questions and aims to provide guidance on how to address them. I have purposely tried to stay away from discussing the specifics of cryptocurrencies, or the cryptographic and algorithmic details of blockchain technology. Instead, I have looked at blockchain from the viewpoint of enterprise architecture and application development. I have tried to draw parallels between blockchain and relational databases, as relational databases are still the workhorse for enterprise applications.
The most significant innovation delivered by blockchain technology is the enablement of Decentralized Transaction Processing Applications. The first such application was Bitcoin, released in January 2009, which enabled the transfer of cryptocurrency between users without the need for a bank to act as a trusted intermediary. But there are many other enterprise use cases besides cryptocurrencies that blockchain technology can be applied to. It is worth delving a little deeper into the decentralized computing model to understand its relevance for enterprise applications.
Centralized Computing: The common definition of Centralized Computing (e.g. Wikipedia) focuses on the location of the application code and data — applications whose code and data reside on a single computer are generally defined as Centralized Computing applications. For the purposes of understanding blockchain a different definition of this term, with a focus on ownership/control of the application code and data, is more useful. With this in mind, Centralized Computing can be defined as a model where the data and the business logic of an application is independently controlled by a single entity. All the users of the application trust this entity with the correctness and integrity of the code and the data. Most applications we see today adhere to this model.
Distributed Computing: The common definition of this term (e.g. Wikipedia) also focuses on the location of the code and data. Distributed computing is a model in which multiple servers (nodes) are used to execute a single logical task or application. The goals of distributed computing are scalability and fault tolerance. As long as a single entity controls the data and business logic of the application, it is still considered a centralized computing application per the definition above, even though multiple servers are involved. Most applications we see today follow the Centralized Computing model and are implemented as Distributed Computing applications.
Decentralized Computing: In this model, no single entity independently controls the application code or the data. Multiple, independently controlled nodes, each have an exact copy of the data and of the application code. Early examples of decentralized applications are Napster, BitTorrent, and InterPlanetary File System ( IPFS). These applications have primarily been in the peer-to-peer file sharing space.
Most business applications, though, are transaction processing applications. They depend on the execution platform to provide the properties of Atomicity, Consistency, Isolation, and Durability (ACID) to correctly execute their business logic.
Decentralized Transaction Processing is a special case of Decentralized Computing, which has the following additional properties: Every valid transaction is executed in the exact same order, on each of these nodes, and operates against the copy of the data maintained locally at each node.
Bitcoin was the first public implementation of a decentralized transaction processing application. It enabled the transfer of value (bitcoins) from one party to another with a guarantee that no party can “double spend” a given unit of value (bitcoin), all without the need for a trusted third party. While the initial buzz about Bitcoin was primarily focused on the cryptocurrency it introduced, attention gradually shifted to the blockchain technology underlying Bitcoin and how it could be used to enable other interesting use cases.
So blockchain is interesting because it enables a new class of applications we call Decentralized Transaction Processing Applications, or Dapps.
There are many blockchain platforms available today. The description of the components below is not intended to describe any one of these platforms exactly. Also, most of the description below applies to public blockchain implementations. I have discussed permissioned blockchains later in this article. Public and permissioned blockchains are defined in the section below.
At the heart of a blockchain platform is a Distributed Ledger (DL), an append-only ledger of all valid transactions executed on the blockchain network. Transactions are grouped into cryptographically sealed blocks that are cryptographically chained to each other (hence the name blockchain). Each full node in the network has an exact copy of this ledger.
Functionally, the DL is analogous to the transaction log of a relational database, which maintains records of all transactions that modify the database state.
Storing application state on the blockchain has been a controversial topic. When Bitcoin was first released it did not support storage of any application state on the blockchain other than that required by the cryptocurrency transfer protocol. After much debate, Bitcoin Core release 0.9 introduced the OP_RETURN opcode, which allowed 80 bytes of data to be included as a transaction output. Creative usage of this limited storage has emerged. Proof-of-Existence Dapps store a hash of the digital artifact in this field — the actual artifact is typically stored in some Distributed Hash Storage (DHT) like IPFS. Colored Coin protocols implemented on the Bitcoin network store the state they need as encoded strings in these 80 bytes.
Ethereum, another popular blockchain platform, introduced the ability to store application-defined data structures directly on the blockchain. The only way to make changes to these data structures is by executing transactions on the Ethereum network. Each node has the exact same copy of the state.
Functionally, state on the blockchain is similar to state stored in relational databases. The complexity of the data structures stored in relational databases, and efficiency with which data can be queried and retrieved from relational databases is arguably greater than that provided by blockchain implementations at the present time.
Every blockchain platform provides a mechanism for developers to define new transaction logic (or protocols). Bitcoin executes transactions using a stack-based virtual machine which implements a limited set of instructions (op codes). Bitcoin allows specification of custom transaction logic by allowing opcodes to be embedded in the transaction message. Ethereum has a more powerful, Turing complete, stack based virtual machine called the Ethereum Virtual Machine. Ethereum allows transaction code (Smart Contracts) to be housed on the blockchain network and be invoked by transactions from users or other Smart Contracts. Deployment of Smart Contracts on Ethereum is itself accomplished with a transaction, and hence is subject to the same rules (consensus and transparency) that all transactions are subject to. In addition, the Smart Contract code is stored in Ethereum’s DL and hence is immutable.
Functionally, Smart Contracts are analogous to stored procedures and triggers in the relational database world. It allows access to state to be encapsulated by custom logic. In this sense, the automation of business logic enabled by Smart Contracts is not a revolutionary development. It is the decentralized nature of Smart Contracts, and its immutability, that are new.
Traditional transactional applications depend on a Transaction Manager to provide Atomicity, Consistency, Isolation, and Durability (ACID) when accessing shared state. One of the services provided by a Transaction Manager is a strict ordering of all transactions accessing shared state. Decentralized Transaction Processing applications need the same ACID guarantees to implement useful business services (like transfer of cryptocurrency). In other words, they need a Decentralized Transaction Manager, which among other things will ensure a strict ordering of all transactions executed on the blockchain. The process of arriving at this strict ordering of transactions in a blockchain platform is referred to as consensus or mining.
Consensus in blockchain is analogous to the built-in transaction manager functionality of relational database management systems, which determines the strict ordering of transactions accessing database state.
Every participant on a blockchain network needs a public key/private key pair. All transactions are digitally signed by the sender.
In public blockchains, anyone can participate in the blockchain network. There is no central authority that grants access to the blockchain network or creates the public/private keys that represent a user.
In permissioned blockchains, a central authority, trusted by all participants, controls identity and access management for the blockchain network. While this may seem antithetical to the core blockchain principle of eliminating a trusted 3rd party, permissioned blockchains have value in situations where the participants already have a partial trust relationship e.g. in Business-to-Business contexts.
Unlike relational databases which provide a rich role-based access control framework to govern access to the state in the database, blockchains, by design, allows anyone to read the entire DL contents including the state. Any blockchain state that needs restricted visibility should be encrypted or hashed before storing it on a blockchain. Permissioned blockchains have introduced mechanisms to somewhat limit this transparency. More discussion on this further below.
As mentioned earlier, Bitcoin, and other cryptocurrency platforms, enable the transfer of value between parties, and have the entire network trust that the transfer has indeed taken place (guarantee of no “double spend”) without the need for an intermediary. Namecoin is another blockchain network that allows anyone to register key/value pairs and transfer them to others in a decentralized manner. In both these cases, elimination of the trusted intermediary eliminates the potential betrayal of trust by this trusted third party. It also allows the users to avoid paying commissions that are usually charged by such trusted third parties. (Users do need to pay transaction fees on public blockchains to have their transaction confirmed. There is more discussion on transaction fees later in this article.)
Relational databases can be replicated for performance or high availability and fault tolerance purposes, but all replicas of the database are controlled by a single entity. A bad actor at this single entity could intentionally subvert the database state (e.g. siphon off money from an account) without the knowledge or consent of the users. A hacker that gains elevated privileges to the database could subvert the database state without the knowledge or consent of the users or of the entity that controls the database.
One of the properties of a blockchain that is often touted is the immutability of the DL — once a transaction is included into a block on the blockchain, it can never be altered or undone by anyone. This is a key property of a blockchain but there are some caveats that are worth discussing in more detail.
Most public blockchains currently use a Proof-Of-Work (POW) protocol for implementing consensus. The primary vulnerability of POW is the “51% attack”. If 51% or higher of the consensus power (hashing power in the case of Bitcoin) is held by a single entity or a single group of colluding entities, they can retroactively alter the set of transactions and/or the order of transactions included in the DL, thus mutating the DL. Please see https://en.bitcoin.it/wiki/Weaknesses for details on this vulnerability (section titled “Attacker has a lot of computing power”) and other weaknesses of the Bitcoin platform.
It is possible that a node on a blockchain network may, at times, have to throw away a block (or a set of blocks) in its copy of the ledger and accept a new block(s). This is a consequence of establishing decentralized consensus in the presence of network latencies. As a result, immutability is not immediately guaranteed at a node as soon as a block is included in its ledger. Many Bitcoin clients will consider a transaction as “unconfirmed” until the block the transaction is included in is 6 blocks deep. Hence immutability is not immediate — the time to immutability depends on the time interval for block creation (10 minutes on average for bitcoin, 15 seconds on average for Ethereum), and the number of blocks you are willing to wait. For a more detailed discussion see https://en.bitcoin.it/wiki/Confirmation.
If full auditing of all create, update, and delete operations against all state in a relational database is enabled, the audit log should provide the same kind of information provided by the transactions stored in the DL. In that sense, relational databases can provide the same kind of immutability (or more correctly, a full accounting of the mutation of state). As explained earlier, the database, and its audit log, is controlled by a single entity, and hence subject to the limitations of centralized computing model.
All participants on a blockchain network can see all the transactions stored in the DL. This is analogous to a database where every user can read every record in the database. Transparency is one of the primary virtues of blockchain, allowing decentralized storage and access to the DL. But it does mean that only state that can be shared publicly should be stored on the blockchain, or the state should be encrypted or hashed before storing it on the blockchain.
Public blockchains are symbiotic ecosystems. “miners” provide the critical service of implementing the consensus protocol for the blockchain. Transaction users need the miners to validate and “confirm” their transactions. To incentivize the miners, users include a transaction fee with each transaction they place on the blockchain network. The miner collects the transaction fees included in all the transactions in a block mined by him/her and accepted by the network (this is in addition to any native cryptocurrency created and awarded to the winning miner).
Transaction fees on Bitcoin depends on the transaction message size, unlike credit card fees which are a percent of the transaction value. Complex transactions (e.g. multisig transactions) that have larger message sizes can incur higher transaction fees than simple bitcoin transfer from one individual to another. Transaction fees on Ethereum depend on the computational complexity of the smart contract as well as on the amount of data stored on the blockchain as part of each transaction.
As of this writing, average transaction fees on the bitcoin network was more than $11, but had reached $55 on Dec 22, 2017. Average Ethereum fees are currently $1.50 but had reached $4.15 on Jan 10, 2018. Transaction fees are paid by the user and can be significant factor.
The computational cost and complexity of public blockchains limit the scalability of transaction throughput on public blockchains. This has led to investment in developing permissioned blockchains. The most well known example is the Hyperledger project hosted by the Linux Foundation. The mission of this project is to “ create an enterprise grade, open source distributed ledger framework and code base, upon which users can build and run robust, industry-specific applications, platforms and hardware systems to support business transactions. “
The 5 key differences between public blockchain platforms and permissioned blockchains (Hyperledger projects in particular) that I would like to highlight are:
1. Security and Membership Services: a central trusted authority exists that provides identity and access control management. Only parties authorized by this central trusted authority can participate in the blockchain network.
2. Privacy and Confidentiality: Hyperledger Fabric tries to reduce the issue of full transparency of the DL by creating the notion of channels. An extract from Hyperledger Fabric docs provides this explanation: “ Private are restricted messaging paths that can be used to provide transaction privacy and confidentiality for specific subsets of network members. All data, including transaction, member and channel information, on a channel are invisible and inaccessible to any network members not explicitly granted access to that channel.”
3. Consensus: Hyperledger Architecture , Volume 1 Introduction fo Hyperledger Business Blockchain Philosophy and Concensus is an excellent article that explains the various consensus approaches that are being explored in Hyperledger projects. The extract from the article below explains the reason for this. “ The operating assumption for Hyperledger developers is that business blockchain networks will operate in an environment of partial trust. Given this, we are expressly not including standard Proof of Work consensus approaches with anonymous miners. In our assessment, these approaches impose too great a cost in terms of resources and time to be optimal for business blockchain networks.” Hyperledger projects are exploring a simple Kafka-based ordering service in Hyperledger Fabric, RBFT in Hyperledger Indy, Sumeragi in Hyperledger Iroha, and POET in Hyperledger Sawtooth. The article referenced above explains the details and the pros and cons of each of these approaches.
4. State: Hyperledger projects seem to be making the storge and query of decentralized state more efficient. Extract from Hyperledger Fabric docs provides this description: “State database options include LevelDB and CouchDB. LevelDB is the default state database embedded in the peer process and stores chaincode data as key/value pairs. CouchDB is an optional alternative external state database that provides addition query support when your chaincode data is modeled as JSON, permitting rich queries of the JSON conten.”t
5. Transaction fees: unlike public blockchain networks that operate in a trustless environment, permissioned blockchain networks operate in a partial-trust environment and do not need transaction fees to incentivize miners or to prevent abuse.
Blockchain is a platform for building decentralized transaction processing applications. The platform provides the following capabilities:
1. The state of the application is decentralized (stored at every node of the blockchain network) and transparent (readably by all participants)
2. The transaction processing logic of the application is decentralized (executed at every node of the blockchain network)
3. All valid transactions are stored in a decentralized ledger where:
4. The point at which a Dapp transaction is committed (becomes immutable) is not as straightforward as for centralized transaction processing applications. Commitment of a transaction is asynchronous to the submission of a transaction and requires a time to immutability intervalthat depends on the block creation frequency and users risk tolerance level.
5. Transaction fees are incurred by the sender of the transaction.
As the summary indicates, blockchain a significantly more complex platform to develop on and is computationally more intensive at runtime than traditional application platforms. Transparency of state and ledger implies that it is not possible to provide the multi-layered role-based access control that traditional enterprise applications typically require. Transaction fees can become significant and are borne by initiators of the transactions. So care is needed to decide whether the unique properties offered by blockchain platforms meet the needs of the application and the benefits are worth the increased complexity and performance limitations. In his book “The Business Blockchain”, Vitalik Buterin, the founder of Ethereum, states “Understanding the tradeoffs and wise choices involving databases and blockchains is a key competency that needs to be perfected. Finding the right balance between what a blockchain is particularly good at, and marrying the derived benefits with back-end databases or existing applications is part of the magic that you need to continuously seek out. We are still learning what these boundaries are, and like the pendulum, we might swing excessively toward one side, then to another before finding a middleground.”
Originally published at https://www.linkedin.com on February 5, 2018.