This is the third post in our series, “Getting APIs on the Blockchain”. Previously, we defined and contextualized the importance of modern-day web APIs and introduced The API Connectivity Problem.
Originating from and (unfortunately) still associated with its historical and mystical connotations — namely, a person with divinatory abilities — a blockchain oracle is simply a piece of software that takes information that lives outside of the blockchain and records it onto the blockchain, effectively acting as a bridge between off-chain and on-chain worlds (a bridge with varying degrees of security, I must note).
Our series Getting APIs on the Blockchain won’t focus too much on oracles — in particular, because we focus our sights on the API Connectivity Problem rather than the Oracle Problem — however, we will dedicate a few articles to oracles, given they are a necessary part of the solution to the API Connectivity Problem. You need an oracle node that records data from off-chain APIs onto the blockchain.
However, importantly, we make a clear distinction between first-party and third-party oracles. First-party oracles are operated by the API providers themselves. Third-party oracles are not operated by the owners of the data they serve, acting as middlemen between the data source and the blockchain.
The “Oracle Problem” is an abstract and overgeneralized problem that has been somewhat more formalized as: two arbitrary systems needing to interoperate with one another — through their technical interfaces — in the most general sense imaginable. This over-generality necessitates an ever-flexible interface that can only be supported by third-party oracles.
Solutions birthed out of an underspecified problem, however, are often not optimal because the practical scope of the problem is far more constrained. In most cases — especially now, with the current state of the blockchain space— the decentralized interoperability problem is actually, in practice, the problem of receiving services from traditional API providers in a decentralized way.
I will now list several problems that arise when third-parties are used as an interface layer between APIs and the blockchain.
The addition of a “decentralized interface” creates an entirely new attack surface for blockchain applications. Groups of malicious third-party oracles can collude to manipulate outcomes — and, in fact, a single oracle can skew outcomes when performing “consensus” on real-valued, continuous data (a more formal treatise on such issues is forthcoming).
What’s more, a single actor can fabricate multiple oracle node operator identities — as well as build a sufficient track record of honest operation — to perform the same types of attacks entirely by themselves. This is widely referred to as a Sybil attack.
In order to balance and counteract the vulnerability of an entire new attack surface, the potential benefit an oracle will gain from acting honestly must exceed (by a lot, ideally) the amount that can be gained from misreporting at all times, in order to avoid any malicious behaviour by the oracle. The game theoretic argument is treated with much more detail in Section 3.2 in the API3 whitepaper. This in an additional cost (a tax, if you will) on top of the actual cost of the data that must be paid to these third-party middlemen.
Data feeds depending on third-party oracles require over redundancy at the oracle level. This is because third-party oracles are far less trustworthy than API providers (as already discussed: the latter have traditional off-chain businesses in “meatspace” and respective reputations to maintain).
Note that this decentralization does not provide any additional security at the data source level. It only serves in decreasing the additional vulnerability caused by using third-party oracles in the first place. This increases gas costs and other costs associated with operation personnel, at the very least.
As discussed in our previous post: a price feed fed by x oracles rarely represents x unique data points. The number of oracles serving a data feed does not necessarily correspond to higher quality and more robust data — it’s a bit of a game of optics. Oftentimes, the users of decentralized oracle networks overlook this fact and confuse decentralization at the oracle level with the overall decentralization of the system. This is primarily caused by a lack of transparency regarding the data sources used by the oracles, which disguises the fact that decentralization is severely bottlenecked at the data source (API) level.
Importantly, when data feeds are not transparent with where they get their data from, it becomes quite difficult for developers to accurately appraise the quality of the data feed. And, the quality of the data feed should come into question precisely because data sources are obscured. Oracles have an incentive to gather cheap and easily accessible data since little is enforcing or incentivizing them to do otherwise.
This is a particularly salient point for price feeds since some API providers use advanced filtering and aggregation methods (and price accordingly for their services). A particularly inspiring example is CoinMarketCap’s volume inflation detection algorithm. Clearly, with this widely applicable use case, data source matters.
Using first-party oracles leverages the off-chain reputation of the API provider— something we often take for granted in the measurable, deterministic, and closed world of the blockchain. That is, although API providers might not be explicitly staking (or using some other quantifiable measure) to prove their trustworthiness, their trustworthiness is evidenced by their reputation and success in the off-chain business world (to varying degrees).
Further, the company’s revenue acts as a kind of “soft stake”. Gambling with one’s off-chain, real-world reputation (and thus one’s revenue stream) acts as a major deterrent to malicious first-party oracle behaviour. Take for example the Coinbase oracle which offers signed price data for select markets, leveraging their trustworthiness as a business — even more so as a major business in the crypto/blockchain space that benefits from the health of the entire ecosystem and is thereby less likely to act maliciously than a no-name third party.
Further, it is often taken for granted in third-party oracle solutions that the data source is already trustworthy. Indeed, in the Chainlink network, for example, a user can select where third-party oracles gather their data from via job specifications and adapters (although proving that the oracles do in fact gather their data from those sources is another matter). Such a construction de-facto assumes data sources are trustworthy and that it’s only a matter of finding trustworthy middlemen to transport that data on-chain.
In contrast, first-party oracles tap into the wellstream of trust and reputation already established and maintained in “meatspace”.
To conclude, I argue that if a first-party oracle exists for a particular data source — that is, an API provider directly servicing their data on the blockchain via an oracle — then there is no benefit to third-party oracles.
(I should make quick note that I am not stating first-party oracles are perfectly trustworthy. Of course not. Issues such as downtime and data quality will be addressed in later posts.)
Now, what about that conditional (“if”)? You may be asking: what if there isn’t a first-party oracle? How can we assume the existence (or future existence) of abundant first-party oracles? In the next blog post I’ll cover present hurdles to first-party oracle integration (and why we don’t see more of it) as well as how API3 tackles those exact challenges. Stay tuned!
Previously published here