As I wrote in my article: Understanding The Gold Rush of Scalable and Validated Data powered by Blockchain and Decentralized AI for Hackernoon:
The best results in the AI field are in closed and well-defined ecosystems, such as video games, where AI algorithms have beaten every world champions, even in DOTA 2, considered one of the most complex video game in the industry… …In open environments like social media or big data, AI’s algorithms have performed less, or sometimes AI’s results are dangerously wrong.
In scripted environments like video games, you can train an AI based on a limited number of pre-defined actions, reading the code of the game and in this kind of environment, machine learning algorithms can make decisions based on that.
In open and non-scripted environment, things are a little bit more complicated. For example, in social media AI’s have to deal with fake news, self-reported data, bots and so on. You can create algorithms that are smarter enough to recognize some of them, but, remember in this case you need a lot, I mean “A LOT” of computational power to run every calculation, so at the end of the day it’s incredibly expensive and at the same time with a vast percentage of errors.
Usually, the most useful AIs in open environments are made by the same Company that owns the platform, because they can read their scripts that they own servers are running every time users make specific actions, understanding the quality of data and how to use it as well.
As Thomas C. Redman wrote in his article “If Your Data Is Bad, Your Machine Learning Tools Are Useless” for Harvard Business Review:
Poor data quality is enemy number one to the widespread, profitable use of machine learning. While the caustic observation, “garbage-in, garbage-out” has plagued analytics and decision-making for generations, it carries a special warning for machine learning.
In addition to this, there is some other hidden problems make AIs dangerous to trust.
Let’s start to talk about it with this assumption:
“Trusting an AI trained by centralized data is the most dangerous thing ever!”
Let’s try to figure it out why this is a real problem to consider, if you’re trusting an AI, you’re trusting the dataset used to train it. You can deploy the smarter AI’s algorithms, but if you train that with fake news or manipulable data, the results will be disappointing.
As Jeremy Epstein wrote in an article for Venture Beat “Why you want blockchain-based AI, even if you don’t know it yet”:
“If you are going to trust your decision-making to a centralized AI source, you need to have 100 percent confidence in: 1- The integrity and security of the data (are the inputs accurate and reliable, and can they be manipulated or stolen?) 2- The machine learning algorithms that inform the AI (are they prone to excessive error or bias, and can they be inspected?) 3- The AI’s interface (does it reliably represent the output of the AI and effectively capture new data?)
In a centralized, closed model of AI, you are asked to implicitly trust in each layer without knowing what is going on behind the curtains.”
If you’re a company or you’re running a country, or even if you’re a casual user, before trusting an AI and use it to make a decision or to start automatic AI-based decisions making, you have to ask yourself:
How we can trust AI algorithms, if we’re training these algorithms with a manipulable database, in environments were hackers or owners can edit the data for whatever reasons.
As Maria Korolov argued in her article “AI’s biggest risk factor: Data gone wrong” for CIO Magazine interviewing different AI’s engineers
“The algorithms are easy and interesting, because they are clean, simple and discrete problems,” he says. “Collecting, classifying and labeling datasets used to train the algorithms is the grunt work that’s difficult — especially datasets comprehensive enough to reflect the real world. Even if you have the data, you can still run into problems with its quality, as well as biases hidden within your training sets. These kinds of intrinsic biases may be difficult to identify, but at least they don’t involve data sources actively trying to mess up the results. Take the spread of fake news on social media, for example, where the problem is getting worse.”
If you’re the owner of the dataset and someone makes decisions based on your database, technically you’re ruling him, because you can manipulate the data and on consequence manipulate the choices of others.For example today we’re trusting big tech companies and their datasets because they are for good and they didn’t have any economic incentive to manipulate pieces of information. But companies are ruled by people, and in the future board of directors of these companies could change, for good or for bad.
As Caroline Sofiatti explained in her presentation “The role of a decentralized data marketplace in the future of AI” at Artificial Intelligence Conference in San Francisco:
“The moment you entrust your data to a regular database, you also become dependent on the human organization in which that database resides. You are not the one in control.”
A strong quote to understand the importance of this problem is from Peter Thiel published by MIT Tech Review:
“Crypto is decentralizing, AI is centralizing. Or, if you want to frame it more ideologically, crypto is libertarian and AI is communist.”
If you want to build a smart AI that works as well, it needs to be able to understand what is trustable or not, to do this, you need to read scripts that users are running when they make actions and create pieces of information.
The problem in the web 2.0 is that if you’re not the owner of a network, you can’t read the interaction of users through the scripts, but only the data provided by that, because scripts are 100% private, and they are being run in the owner servers.
In AI terms you can understand a posteriori, which data are credible or not, with complicated procedures and using a lot of computational power. This methodology has not yet led to performant and reliable results.
If we analyze the basic features of a Blockchain network, for example Bitcoin, every action made by users are designed to generate perpetual and not manipulable data without owners, this is an amazing starting point to build trustable AIs.
As Salih Sarikaya wrote in his article “How Blockchain Will Disrupt Data Science: 5 Blockchain Use Cases in Big Data” at Towards Data Science:
“If big is the quantity, Maria Weinberger of Janexter says, blockchain is the quality. This follows the understanding that blockchain is focused on validating data while data science or big data involves making predictions from large amounts of data.”
The most amazing killer application in Blockchain Technology that is opening unprecedented opportunities on building smart and trustable AIs is “Smart Contract,” available since the invention of Second Generation Blockchains like Ethereum. Before the 2014 and the concept behind Ethereum, Blockchain Technology was able to track in a Trustless-based way only financial transaction’s data.
If we analyze the anatomy of a Smart Contract, it’s a perpetual and decentralized script, that runs publicly.
As I wrote in my article “Decentralized Data: “Why Blockchain is meaningless and Trustless is everything” for Hackernoon:
With the introduction of Smart Contract Technology basically, a developer to interact with the Ethereum Network have to decentralized his functions. Every time a User interacts with a Decentralized Function, he has to make an Ethereum transaction. During the Ethereum transaction, the network validates inside the block, the called function, the request and the result. These three aspects are fundamental to deliver Decentralized Data, in a complete Trustless way.
Smart Contracts can finally write decentralized data that stores the script itself, the request and result, without any possibility of manipulation and without owners.
This can be a huge step forward if we train AIs based on that, because with this technology, we know for sure the script and the request that have generated a piece of data without trusting entities.
If we start collecting a massive amount of clean and decentralized data we’ll make a huge step forward in terms of AI’s trust and we will be able to start using it for business choices, ensuring a future as far as we can from a “1984 Scenario” dependent from people that rule big corporations.
But the real next step using this technology will be thanks to the decentralization and publicly running feature of smart contracts. At the beginning of this article, I argued about why in scripted environments AIs algorithms work better rather than in open environments. In open environments the challenge is to understand the purpose of private and not readable scripts that generate pieces of information, in order to understand the quality of data.
With smart contracts the opportunity is to make open environments like the closed and scripted, translating as much as we can complexities into code. This new architecture will defeat a lot of Data Science problems in understanding and cleaning datasets, can achieve less computational power needed to understand the quality of data and finally can help AIs to improve precision in results and predictions.
Today we can build Hybrid Decentralized AIs, basically running AIs algorithms privately on the cloud but training it only trusting Blockchain-based decentralized data.
This is a huge step forward, but the challenge for achieving decentralized AI in the next decade will be determined by how successfully we build Blockchain networks that can run AIs Algorithms in the same Trustless way we run scripts in Smart Contracts today.