Varun

@startuphackers

The Missing Blockchain User Guide

April 14th 2018

Consider this: it would have cost $78,887.25 to store just 3 x 1 MB pictures on the leading blockchain recently. The key lesson from Satoshi Nakamoto, the anonymous creator of Bitcoin, is how an incentive system can be designed which can motivate various participants to work in a mutually beneficial way. While blockchain is a building block for such game theory-driven systems, it does not have universal applicability and needs to be used only as a state transfer machine or where full transparency is needed. This post dives deep into debunking the “magic” of blockchain, and puts it forward as an extension of time-series databases, as a way to manage trust.

Introduction

I am Prometheus. I am anonymous, so what I am saying must be magical ? :)

Well, heck no. I am just one of the thousands of people who has worked in the back-office trading data warehouses of Wall Street — where the concept of “time” and “audit” is well established. Every trade has a history attached to it, and using the concept of bi-temporal data modelling, it is possible to see the full audit trail associated with every single update to that transaction. The concept of blockchain extended that into the public using cryptography, which opened up new use cases, such as that of Bitcoin; and of a new blockchain altogether, Ethereum, to support even more use cases.

It allowed an amateur like me to dig into the transaction history of one of the ERC-20 based tokens and publish an analysis below:

But that’s what I have done for over a decade in the world of time-series databases as well. There is no difference between the two systems if you have access to both. Both are meant to present an accurate historical record, to trace back the history; the difference is centralized, time-series databases are maintained with a mix of manual processes, whereas blockchains are maintained by self-driving game-theory driven incentives.

For this reason, the “blockchain” concept had to have been born out of that world, by someone who understood time-series databases, was frustrated with the 2008 crash, and given the nature of the industry, did not want to reveal his identity. A former academic, definitely a PhD, who worked as a quant and maybe who got laid off in the 2008 financial crisis. There is no “we” — Satoshi had to be a single person, born out of that financial world.

But he or she forgot to tell people not to use blockchain instead of databases, that it’s not a swiss army knife, and in fact has a very specific use case to maintain trust. This is that missing user guide. Don’t use blockchain, unless it makes sense; and please don’t talk about “putting things on the blockchain” when you have never worked with databases before.

Chapter 1: Blockchain Recap

Blockchain is normally viewed as a distributed, decentralized public ledger, made up of “blocks” (or batches) of transactions “chained” together (one pointing to the other, using hashes). It functions by replicating every validated transaction across every node on the network. There is no central authority — even if a few nodes go down, it doesn’t disrupt or make the data unavailable.

As the network grows, and as the number of transactions grows, the overhead on each node to maintain a state of the blockchain grows significantly as well.

In Bitcoin, nodes running a certain piece of software can work to validate any newly created individual transactions by solving a cryptographic puzzle, and then compete to add them to the blockchain, thereby earning in reward tokens (bitcoin, in this case) for the “work” they performed. The resources required to solve this cryptographic puzzle (which keeps getting tougher) are measured in terms of electricity/resources consumed, which is what underpins the scarcity and value of bitcoin.

There are other blockchains which work on other principles — but the core idea of a decentralized, self-driven, reward token-based system which uses cryptography underpins all of them.

Chapter 2: Sample costs of an “Airbnb on a Blockchain” venture from a user point of view

These are based on median gas price of 30 gwei; and ETH/USD all time high price of when 1 ETH = 1402.44 USD, modelled using Danny Ryan’s spreadsheet (post and link below)

It would have cost $78,887.25 to store 3 pictures (1MB each) on the Ethereum blockchain recently. It really shines as a state transfer machine, and that is its only financially viable use case. For all other purposes which involve data storage or computation, those tasks are best left “off-chain”, to regular centralized databases/servers.

$0.02 to “tweet” using a blockchain-driven app ?

Source: https://blocksplain.com/2018/04/17/peepeth-decentralized-twitter/

For example, consider this — a version of Twitter on the blockchain, where people have to pay transaction costs to post messages: ~$0.02 per “tweet”.

How do such products which now charge for what people get for free elsewhere breakthough beyond the initial crypto community ?

Don’t use Ethereum for computation or storage unless necessary

The Ethereum blockchain is financially unviable for consumers/end users purely as a datastore or computation engine. Consider these regular storage and computation scenarios where blockchain is used versus what Airbnb provides this data entry for (hint, its $0). Compare these costs against AWS as well.

Here are the core functions which such a service needs to support:

  • Create a profile
  • List a room (with pictures, and description)
  • Bring viewers to that posting/”create the market”
  • Set availability
  • Chat
  • Facilitate payment
  • System to leave a review and view reviews left by others
  • Incentive to share and refer

Upload 3 pictures of your property as a host (1MB each): $78,887.25

This costs $0 to the host on Airbnb. But with such staggering costs on Ethereum, this only makes sense if this was the room Mona Lisa slept in prior to the day her painting was created.

Write a brief, 50 word host review: $50

This costs $0 to either the host or the guest on Airbnb. If it costs money to write a review, why would anyone write reviews ? How does the system work without reviews ?

Fill (save) out your bio (100 words): $100

This costs $0 to either the host or the guest on Airbnb. Why fill this out if it costs money — in which case, again how do you build trust on the system ?

Compute reservation cost (price per night X # of nights): $0.0002

This costs $0 to either the host or the guest on Airbnb. Please do this on the client side. Ethereum blockchain should not be used/abused for such use cases.

Submit a booking request: The “creation” of a booking request alone will cost $1.35

This costs $0 to either the host or the guest on Airbnb. But hey, if you are doing this on the blockchain, remember to add a “Blockchain fees” to the below table:

Everytime a transaction is sent (host accepts or declines): $0.88

This costs $0 to either the host or the guest on Airbnb. Someone declined your request, hey you still have to pay close to a $1 for that in the blockchain world.

Compare this with Airbnb — where each of the above costs are $0 to both the host and the guest. Some could make the case that hashes of user data could be stored on the blockchain instead of the data elements themselves — but what’s the point ? The cost of that will still be greater than what Airbnb charges for such data entry: $0.

From a user point of view, what difference does it make where/how this data is stored, as long as the data read/write is instantaneous, backed-up and always available. The user is trusting the brand of Airbnb to manage this. It is marginally better for the user to have full control/independence over his/her dataset, but it is not essential for the value needed from the service.

An alternate world where bulk of this data is hosted on a blockchain would mean exceptionally slow user experience. The user would also have to trust the brand of the underlying blockchain (say, Ethereum). Not to mention the core startup activity of actually building out a beautiful product experience along with driving distribution would still need to be done — that is step 1 in any case. The competition with Airbnb is not on price, but the overall experience.

Chapter 3: DApp it — Trust dilemma with storing user data on other users’ computers

Filecoin architecture

As shown in the previous section, using the Ethereum blockchain for either computation or storage in these use cases doesn’t make sense.

Blockchain startups get this. At the end of the day, regardless of what their service is called, how safely they encrypt it, and manage access to it — their alternate proposal calls for storing this data on other users’ computers.

If you were explaining to your grandma, this is what you would say.

Airbnb stores user’s data on databases it manages and controls. If there is an issue with your data, it is responsible.
“Airbnb on the blockchain” — it stores your data on other users’ computers. It does it efficiently and safely, but that is what decentralization means. Its not cheaper, but its a question of choice.

Does it sound okay to your grandma Alice that your pictures, reviews, bio, and every thing you write or upload, is stored on Bob’s computer ? Yes it is behind a lock and a key, and Bob can’t read it — but in principle, does that sound like a good idea, to your grandma, not to your inner crypto enthusiast ? Here is the question in grandma’s head:

Do I trust Bob or do I trust Airbnb ?

That is the key trust issue of our times, which users will vote on with their wallets.

Do they trust centralized brands and access, or are they ok with decentralization storage ? I don’t care either way. “Where” data is stored is irrelevant, what matters is, what is the value which the user got.

At the end of the day, if the service quality is not as cheap, fast & efficient as what Airbnb already provides, any “Airbnb on the blockchain” startup is a no-go in any case. The service has to be 10x better for users to switch. So it will be fun to watch these blockchain startups raise tens of millions, and then realize, they still have to make an overall product + value proposition which the users want. Vast majority of users do not care “where” the data is stored, and in fact if you were to tell them that it is stored on Bob’s computer and not with the company itself, they would not be able to trust you.

Chapter 4: Blockchain does work exceptionally well as a system to transfer state (such as money)

On an example $350 total reservation cost, the guest and the host together end up paying about $63 in fees to Airbnb (lets assume max of 15% charged to the guest; 3% to the host).

If the transaction was done only in ETH, where both the host and the guest had an ETH wallet, then the transaction cost would have been around $0.88 to the guest (based on the cost to send 1 transaction on Ethereum with the above assumptions). To get it done a little faster, more transaction cost would have been paid.

If the guest was paying in USD to purchase ETH and place in his wallet; and then ETH transferred to the host’s ETH wallet, and then the host converted ETH to USD — there would have been additional transaction costs mixed in.

But this shows where the real utility of the Ethereum blockchain lies — as a state maintainer, which in this case happens to be the state of the guest’s account, and the state of the host’s account. For this use case, using the blockchain is far preferable.

As a subtle reminder, users don’t care about the underlying tech, they care about what do they get out of it, what is the totality of their experience with the service.

Hybrid really means 99% off-chain + 1% on-chain for state management only

An Airbnb on blockchain would really be a mix of:

  • 99% centralized service, where profiles, reviews, pictures, etc are saved for cost and efficiency. Maybe, this is the part which transitions to a decentralized storage architecture, if parity is maintained for speed and cost.
  • 1% for value-transfer between the guest and the host, which is done on the Ethereum blockchain, which provides tremendous saving in cost, and the immutability and verified nature of value transfer which occurred.

Disrupting a well-established, centralized startup like Airbnb is thus practically unfeasible for a purely decentralized blockchain-driven startup. It will have to compete on product UI/UX, network effects, insurance/guarantee, and other factors. The overall product experience is what matters, and everything else being equal, Airbnb is significantly cheaper to use than a pure “Airbnb on a blockchain” play.

Chapter 5: Reward tokens, for example, what if Medium had a “clapcoin”

Reward tokens can be used to structure the economy of a product — creating suitable incentives for the participants to engage in certain behaviors which is mutually rewarding for them and for the system. Think of Airmiles, but linked by tokens enabled by a blockchain. In the Ethereum world, those are ERC-20 based tokens. This is basically a set of functions with very basic capabilities (check balance, transfer, and so on), a variable name defined to be the token’s name, total supply, and decimals to be used for display.

This article is a good primer on ERC-20 tokens:

Drive certain user behaviour with well-designed, game-theory driven systems across a range of domains

A core use case for blockchain is the reward token, which determines how the economy of your product is defined. In bitcoin’s case, sufficient incentive was created for miners by leveraging cryptography. However, a reward token based system doesn’t have to use cryptography-driven reward; it could be something else altogether. For instance, Medium can introduce a clapcoin, which would do the dual function of both signalling good posts and rewarding the author automatically as well. It could require certain number of tokens to be able to comment on posts as well — again building out a self-driven incentive system. There is no need for manual, human moderation. Pair with an A.I. driven-moderation agent, and other humans moderating content due to the built-in incentives, and suddenly you have a safe space on the Internet to communicate with others and have good discussions.

Maximize database, minimize blockchain usage

The best way to use the blockchain is to use it only for the minimum absolute need, mixed with a relational database which does bulk of the heavy lifting. There is a cost in time and money involved in writing transactions to the chain which is significantly more expensive than simple database writes. An intelligent blockchain strategy would involve utilizing the database for as much as possible and handing off transactions to the blockchain only when absolutely needed.

In this example there is no need for Medium to store the posts, comments, etc on a blockchain. Even storing every clap on the blockchain right away would not make sense as it would be terribly inefficient given blockchain’s fundamental design: instead single month-end aggregated clapcoins collected in the database could be written to the blockchain.

Chapter 6: The truth is out there — bringing full transparency and accountability

It’s unconceivable to imagine a future where governments in democratic countries do not move towards widespread adoption of blockchain across a range of government services, from maintaining land registry, to other functions.

Or how about voting ? Shouldn’t every vote be tied to a singular eligible voter, whose identity is not known but is validated in the system ?

It’s also hard to imagine a future where blockchain is not used across the non-profit sector as well. Lets say if every “donatecoin” you gave can be tracked through where and how it was allocated by the charity you gave it too. If too much is going to administrators, then it would be known without requiring an internal whisteblower to leak such secrets.

While there is significant overhead involved with using a blockchain, and someone has to pay for that, but where trust and accountability has a higher need, it would be justified.

In the private sector though, question worth asking is, if data can be stored on centrally managed databases, and exposed via APIs — why is blockchain, with its associated overhead, better ? Lets debunk the magic of blockchain a bit..

Chapter 7: Blockchain is an extension of existing, widely-used concepts in the financial data world

Trading data warehouses are basically modelled like what would be “centralized blockchains”. All the concepts except the part about storing data on multiple distributed peers extend from bitemporal modelling, which has been done in relational databases for a long time. This is the fundamental basis of reporting data warehouse systems especially for the financial world and is a well known best practice. These are the key features of such systems:

1. Immutable records along with a timestamp, blocked together, providing a view into state “as of” that time

“What was the state of this data at any point in time in the past ?”

The time every transaction is entered into the database is recorded, along with a unique batch identification number in which all such transactions were processed. This allows one to query the state of the system at that point in time. Immutable due to process, because *no one* is supposed to alter that fundamental unit of information about a transaction. A hash of the transaction is stored, and which is what is used to compare and track changes.

Transactions, blocked together (held by a unique block id every time a data load process runs), with their timestamp being recorded and being immutable (due to established procedures). Given these block ids are sequential — you can trace backwards from transaction blocks if needed.

2. Correct historical records with a complete new row instead of overwriting prior fields

“What is the most accurate representation of this data right now ?”

A second feature of such systems is that other transaction data can be corrected at a later point in time, which is useful for financial reporting systems.

This correction is not done by updating a field of a particular record directly, but instead by inserting a brand new transaction record which has the updated value along with all the prior unchanged fields, and a new timestamp recorded along with a new unique block identification number. The earlier, incorrect record is made unavailable for further queries; and at each point in time there is only one true, correct representation of data.

Using #1 above however, you can still go back in time to trace the history of updates.

An example — finding the core truth about a transaction

What bitemporal modelling looks like in standard relational databases

In the above example, we can ask the system:

  • What was the state of this particular trade as of Jan 10, 2016 at 9:00pm EST (where it was “10”), who entered it ?
  • What was the state of this particular trade as of Jan 10, 2016 at 10:00pm EST (where it was “11”), who entered it, and who updated it ?
  • What is its current state (where it is “12”) ?

In each such record, there is also the history of what other transactions was it grouped with during the data load process (which is the unique, system generated block ID — a single data load process might load 100 such transactions for instance), what is the unique transaction ID, exactly at what time the record was inserted, by whom, and so on. These form the “core truth” about each such transaction.

These systems predate the blockchain and are used widely in the financial world. However, such time-series databases have a significant efficiency overhead, which makes their use case very limited. For blockchain, not only is it “time-series” in spirit, it is also mass-distributed and replicated, significantly adding to the overhead.

Everything which can be stored in a centralized relational database can be stored in a blockchain, with the added benefit of linking them to historically verifiable tokens which can then be traded on exchanges and used as in-product reward tokens.

Chapter 8: Manual processes are simply automated in a blockchain based on system design

1. From centralized databases -> decentralized blockchains

The blockchain concept, as popularized by Satoshi, made away with the #2 above altogether as it is more useful for reporting systems. It also provided the core invention of how can you have these transactions recorded/verified by a distributed set of nodes and built in an incentive mechanism for them to work. The how of that incentive mechanism involved using cryptography and certain rules to reward people to build and run such a network.

In a centralized database, the record history is immutable by process. For example, such unauthorized updates would lead to the person getting fired, sued or both! In a blockchain, the record history is immutable by design — where even deliberately the record cannot be changed. The tracing history exists as it does in bi-temporal modelled relational database design, along with the concept of block IDs, among others.

2. From stored procedures -> smart contracts

In a relational database like Oracle, code can be written in stored procedures and made to run automatically upon a certain set of conditions being met. This can query and update records as appropriate. Stored procedures are the boring workhorses of a relational database, normally full of convoluted code.

“Smart contracts” as enabled by the Ethereum blockchain replicate that functionality. They could have easily been called stored procedures, but won’t have been as exciting. They are “smart”, and they are “contracts”! Well, all code is. It doesn’t go out for a coffee break when you run it. Let’s not worship it — its not suitable for all use cases either.

Regular database stored procedures don’t need to be “paid” to be able to run them (apart from whoever is paying for the server costs), but smart contracts on a blockchain do, as they are tapping into the resources of the nodes.

3. From in-app currencies -> blockchain-based tokens

A lot of apps and organizations run their own reward programs. The main difference with blockchain tech has been that now that reward point is called a token, and is needed for the underlying functioning of the smart contract and the blockchain.

For example, its like if you collected 100 airmiles, and then you had to pay 5 airmiles just to make a booking on a website, or if you wanted to send 80 airmiles to grandma on her birthday — now you can do that with a blockchain-based token. And if for some reason you also needed an open audit trail associated with your use of these airmiles, then this is the perfect solution.

What if speculators engaged in buy/sell of such airmiles — and if your holdings increased in value over time ? That then turns into another asset class for you, another way to earn wealth, from something which was completely locked into its own database sitting in obscurity, now you have the ability to trade it.

Chapter 9: Why not to use a blockchain ?

1. Blockchains are incredibly inefficient: slow and ridiculously small per second transaction ability

Users have come to expect relational database level read/write speeds. To ask them to wait because of underlying technology would simply be a no-go. Unless blockchains are able to provide that same level of performance, their real-world use cases for end-consumers would remain theoretical; pumped up visions only used to sell tokens to unsuspecting investors.

Consider that Ethereum supports just 15 transactions per second! From the scaling article below: “It depends on a network of ‘nodes’, each of which stores the entire Ethereum transaction history and the current ‘state’ of account balances, contracts and storage”. Its designed to be inefficient.

Why would you want your consumer-oriented product to be built on such a network ? Why would you choose it over a relational database hosted on a cloud platform like AWS or Google Cloud ? Do you still download movies and songs on BitTorrent or prefer the much faster service provided by Netflix or iCloud ?

Its worth asking the question: why does using the blockchain in this case make any sense from a user perspective ? Does it make their life better or worse when they use your product ?

Even though centralized databases are expensive and difficult to maintain; for enterprises the logical migration path is moving to the cloud. They need speed, consistent uptime, support and have been building systems to provide accurate statement of truth about each transaction based on internal processes, without a blockchain.

2. Privacy for certain use cases: no one wants their full balance and transaction history for everybody to see

How would regular users react when they find out that their entire transaction history, and the amount of their wallet is publicly available for anyone to search through ? Even though it is not linked to their name — still at the end of the day does anyone want their bank account number, with balances and transaction history published online ?

And since the knowledge of the private key is the only thing holding back transfer of funds — bad actors can exploit/threaten remotely to gain access and transfer your funds; unlike a bank which would at least put up a fight for you as its reputation depends on that.

This also means enterprises would go for a private blockchain at best. There would eventually be an “Oracle of blockchains” — a company big enough and fully committed to providing optimized deployment of a private blockchain, including support and consulting services. It might help in inter-company connections as well.

Blockchain-as-a-Service: Losing the plot ?

Such deployments, provided by leading cloud providers would help to alleviate some pain points while providing benefits. But then again running a decentralized network on a centralized network, and you have to ask: what is the point and where does it add value ? Why not just open up your relational database with suitable access only in that case ?

Conclusion

We are in an era of “crypto-driven” business model disruption. However, blockchain for the heck of it is a road to nowhere.

We don’t need to ever know who Satoshi is — what we do know is that the core idea pioneered with the deployment of Bitcoin just makes a lot of sense, and that idea is how can you design a well-rounded, self-driven incentive system for your product ?

This is as good as time as ever to design intelligent blockchain-driven ecosystems which make use of reward tokens and bring greater transparency and smarter systems. Humans and A.I. would fit into this world of smarter systems. However, blockchain is a hammer, not a swiss army knife, so please use with caution :)

Thank you for reading

Appendix: Source for Ethereum usage cost

Spreadsheets shared by Danny

https://docs.google.com/spreadsheets/d/1KeWKkn0BYhOt1p6lM6BDQAWLin-2JQmGpwswU3kPw9c/edit#gid=0

https://docs.google.com/spreadsheets/d/1n6mRqkBz3iWcOlRem_mO09GtSKEKrAsfO7Frgx18pNU/edit#gid=0

Breakdown

  • Adding two integers: 3 gas units
  • Multiplying two integers: 5 gas units
  • Base transaction: 21,000 gas units
  • If you are interacting with a contract which multiplies two integers: 21,000 gas + 5 = 21,005 gas units
  • 1 Gwei = 0.000000001 ETH
  • Gas price specified in gwei/gas; current median 30 gwei; some pay less, some pay more for faster transaction processing
  • Total fee paid per transaction = Gas price * Gas units used

More by Varun

More Related Stories