Jeremiah Smith

@itsJeremiahS

Data Marketplaces: The Holy Grail of our Information Age

This is part 2 of the Data Deep Dive series. Part 1 describes how data is transforming virtually all aspects of our economy and society, but that the true promise of the Data Economy remains largely unfulfilled because we still lack the technology to allow for standard, secure and efficient data exchange.

Enter data marketplaces

First off, what is a data marketplace? Just like any marketplace, it is a platform which enables convenient buying and selling of a resource — in this case, data. In practice, a data marketplace is a piece of software which data providers and data consumers connect to through a graphical or backend interface to buy and sell data from each other.

Just as shares and currencies are traded on different types of exchanges, different types of data and use cases require different types of marketplaces. For instance, a personal data marketplace could enable individuals to choose who they sell their personal data to and directly pocket the proceeds, while a business data marketplace could allow two companies to buy and sell industry data from each other like localized product prices, insurance claim statistics or data about recent investments deals in a given industry.

Key properties of data marketplaces

To understand why data marketplaces are the missing backbone of the data economy, let’s consider the 3 fundamental roadblocks holding back its full potential outlined in the Data Deep Dive part 1.

  • Most data exists in unrefined (or unstructured) form and it is non-trivial to convert it into structured data, the format needed for use in software.
  • Data owners use incompatible data models to structure their data which is kept in isolated silos although often sought-after by others.
  • No one has figured out how to price and exchange data efficiently yet.

Although different data marketplaces have varied properties depending on their specific use case, in general, the data marketplace paradigm allows for:

  • Crowdsourcing: by making self-serve data selling a reality, they provide the solution to move away from inaccurate/expensive single-source data.
  • Aligned incentives: data owners/collectors directly benefit from keeping data in structured form and making it available to others.
  • Standardization: by design, a marketplace defines a common data model and interface for buyers and sellers to exchange data.
  • Fairness: instead of a having a central authority pricing data, providers can set their own prices while consumers can choose who they buy from.

A better model for knowledge sharing

Ultimately, a data marketplace can be thought of as a knowledge sharing platform which aligns data consumer and data owner incentives better than current data sourcing methods.

For instance, manual data collection is a tedious process for data consumers and provides no value to data owners. Similarly, web scraping is slow and/or expensive, sometimes illegal, for consumers and again gives no value to owners. On the other hand, ad hoc data deals between data consumers and isolated data providers tend to place a lot of power in the hands of the providers, giving them little incentive to provide affordable, high-quality data.

So, paradoxically, data flow in our so-called Information Age is broken. This makes data acquisition an unnecessarily costly task and at the same time is a huge missed opportunity for data owners who often sit on mountains of unutilized or underutilized data — lose-lose.

The reality of working with data today: data scientists spend over half their time collecting, cleaning and organizing data before they can use it [Figure Eight].

In a marketplace setting however, buyers only pay for the data they consume, this effectively creates a strong incentive for data providers to offer the highest-quality, most sought-after data possible as this maximizes their revenue. Similarly, the infrastructure and inherent data exchange standard created by the marketplace removes friction between data buyers and sellers which lowers the cost of acquiring data — providing a solution to the first 2 roadblocks for the data economy to fulfill its true potential.

In turn, better value for consumers is likely to increase demand, and thus provide more potential revenue for data providers. This ultimately makes it a better deal for both parties than the status quo (think Airbnb) and forms a positive feedback loop.

Finally, by allowing data providers to set their own data prices and enabling data consumers to choose who they purchase data from, not only do data marketplaces allow consumers to signal which data/sellers provide value, but they also solve the data pricing conundrum by taking a free market approach.

Since many organizations already collect data as part of their normal operation,

data marketplaces are set to expand the breadth and depth of data available today by at least an order of magnitude in the very near future, as Wikipedia did for encyclopedic knowledge.

The last Encyclopedia Britannica was printed in 2010 and covered 500,000 topics. Today, Wikipedia has over 5 million articles, with 600 new articles per day and 10+ edits per second [Wikimedia, source].

Blockchain: the missing link

Now that we’ve settled the theory, let’s switch to practice. To create data marketplaces which deliver on the promises above, we need:

  • A way to make data marketplaces as open as possible — to crowdsource as much data as possible — yet protect buyers and sellers from bad actors.
  • A way to guarantee data sellers get paid — and get paid the right amount — each time their data is purchased.
  • A fast, secure and scalable micropayment infrastructure to allow freedom of use: buyers should only have to pay the data they consume.
  • A way to guarantee data provenance to ensure purchased data hasn’t been tampered with and actually comes from the alleged seller.

The advent of blockchain smart contracts finally creates a way to enforce these properties in a way that does not require data providers and data consumers to trust each other or the marketplace. When dealing with authentication of marketplace participants, data and payments, trust is a critical factor. In fact, centralization of trust is likely one of the main reasons why previous attempts to create data marketplaces have failed — most notably the Microsoft Azure DataMarket which closed down in March 2017 [Microsoft] after 7 years.

We now look at the new generation of ventures which are leveraging blockchain technology to make data a tradable asset.

Blockchain-powered data marketplaces are coming!

Data marketplaces are like modes of transportation. Bicycles, cars, airplanes etc. each constitute a unique technology which make them ideal in different use cases but the existence of a universal solution which could invariably trump all others seems unlikely.

Because data stakeholders are so diverse and use cases for data inexhaustible, an intuitive way to categorize data marketplaces is by the type of data they allow participants to exchange: personal, business and sensor data.

Some of the key properties of blockchain-powered data marketplaces categorized by data type [The DX Network]

Personal data marketplaces are on a mission to empower consumers by allowing them to monetize their data directly and on their own terms. Examples include Datum, DataWallet and fysical which allow users to sell anything from their email address to social media streams or their location.

The key characteristics of personal data marketplaces is that they have a consumer-facing component, e.g. a mobile app, whose purpose is to collect data, ensure data is stored/delivered securely (locally or on a third party decentralized storage) and provide the interface for users to manage data purchase requests. On the buyer side, individual datasets are purchased from target users using an API to be integrated into an internal product, lead generation process, marketing campaign etc.

Business data marketplaces are designed to enable efficient business-to-business data exchange, focusing on structured data and large data providers/consumers. This is the case for Ocean Protocol and our own DX Network, which provide platforms to trade enterprise knowledge such as industry-specific data or scientific experiment results.

What is unique about business data marketplaces is that data offered by providers is usually not related to the providers themselves. This has a fundamental impact on their underlying technology which must have many properties of shared databases while personal and sensor data marketplaces are closer to data catalogs. This causes data to be aggregated within the marketplace and is thus served ready for immediate use. Both providers and consumers interact with the marketplace through an API geared towards business intelligence, research, machine learning etc.

Sensor data marketplaces allow for the purchase of real-time data feeds from remote devices. For instance, IOTA Data Market, DataBroker DAO and Streamr offer pollution, power grid and vehicle telematics data feeds.

The characteristic property of sensor data marketplaces is the real-time nature of the data for sale. Currently, sellers list their sensors on a marketplace interface, together with a price per period or per reading, and buyers use the same interface to subscribe to data streams. In the future, IoT devices could use marketplace APIs to automatically list themselves and monetize their data streams and/or autonomously buy data streams from other devices, to improve urban living, transportation, manufacturing etc.

If you know of another data marketplace which should have been mentioned above, leave a comment below or reach out @ twitter.com/itsJeremiahS

Smart contracts are the key

Blockchain-powered data marketplaces is the innovation which will drive the emerging data economy. The next Data Deep Dive will give the practical details of how smart contracts are used to decentralize trust in data marketplaces.

More by Jeremiah Smith

Topics of interest

More Related Stories