Unless one has been living under a literal rock (and not on top of one), you’ve probably heard of ChatGPT - the groundbreaking, dystopian-creating dialogue-based AI system.
Its extremely conversational manner has its users pushing it to its limits.
Most are in awe at its ability to write code in real-time or produce infallible, original essays.
At a glance, ChatGPT is quite impressive. While this technology has existed for some years now, with even other companies launching similar initiatives in the past, ChatGPT was able to obtain one million users within six days.
From a product perspective, this was certainly proof that ChatGPT fulfilled a need within the market. It will most likely change the gig economy forever, as it essentially enables an interactive Google search with much more concise and actionable results in real-time.
However, the conversation of AI is often congruent with that of ethics - many began to question the potential dangers of this model being available to everyone.
As shown in the past, humans have had a bad rap with teaching AI to say things that shouldn’t be uttered, let alone thought.
On a more philosophical level, what is the source of truth for ChatGPT?
What about other, future GPT-based systems?
How do we ensure what biases, datasets, and parameters are being factored in without compromising the security of the AI?
These concerns (written as “limitations”) are actually acknowledged by OpenAI in
Before solving the inevitable AI chatbot uprising, allow for a brief explanation of how it actually works from a bird’s eye view.
ChatGPT is based off of GPT-3.5 - a slightly newer, better version of GPT-3.
GPT stands for Generative Pre-trained Transformer 3.
“It’s an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as a prompt, it will produce text that continues the prompt.” -
In simpler words, it’s a predictive, language processing model that is trained specifically to produce human-readable tests. This notion is tested using the Turing Test, the goal being that the AI-generated text should be indistinguishable from its human-written counterpart.
GPT has to try to predict the correct answer. When the model is being trained, it keeps tuning its inner variables until it gets the correct answer.
Many factors are accounted for when training the model, such as keeping track of the attention of the word - i.e, the influence/ranking of the word in the sentence.
For more information on how it works on a more technical level, read
ChatGPT was the first to truly open this functionality in a user-friendly way to the public, which is both a fantastic and scary thing given its parabolic growth.
Most of the issues that come from GPT-based AIs like ChatGPT lie within this quote:
"At the core, GPT-3, like other AI models, is only as good as the data it has been trained upon and humans create this data. The same beliefs, biases, errors and falsehoods we hold are reflected in the AI responses. And since tools like ChatGPT come across as intelligent, objective and confident, we tend to believe what these models give us." -
The primary problem with these models is the data its being fed. Before an AI is made useful, it has to consume, interact with, and test against billions of words and parameters. These datasets are usually filtered and curated to contain specific information.
In the case of ChatGPT, it derives its data from the Internet - which enables it to have a plethora of different solutions at its fingertips (does an AI have fingertips?).
However, this also means it can bring some of the darker sides of the Internet and its biases with it.
The problem isn’t with the AI itself - it’s tracking the training and data collection processes that create it.
If one could track and trace, with a degree of certainty and transparency, the history of a model training over time, its sources, and its overall journey, then much better determinations can be made over the confidence of the results it produces.
This way, the value will be more apparent in more focused models that have a specific purpose, motive, and curated data.
To be clear, OpenAI is aware that models can be biased, and that a robust source of truth needs to be established at some point.
And what better technology to keep an immutable, transparent, and chronological record of the creation of an AI than a distributed, fault-tolerant ledger?
Most see AI as a sort of “black box” of functionality, where the origin of the data, where it was gathered, under what circumstances, and how it operates remain unknown.
However - what if whenever a new AI was created, each relevant process was submitted onto a ledger for the public to view, so they are aware of exactly how the AI operates based on the given data?
Blockchains are good at keeping a verifiable, unbiased record of truth.
Obviously, this would only be for public-facing AIs like ChatGPT. Everything from the dataset, to who was involved, essential parameters, to any potential biases could be kept as an on-chain presence.
As the AI progressively trains and gets better, it’s also updated in real-time to the ledger. This way, even the developers responsible for its training would be able to get a clean, chronological view of exactly how the AI is doing in terms of performance.
More importantly, the ledger would provide a direct source of truth backed up by the provenance of the AI’s creation.
In other words - we keep the AI accountable from its creation, track its origin, motives, and how exactly it was influenced from the training level.
It would ensure consistency and provenance of data. Data integrity is at an all time low. Using a record keeping system, like blockchain, we could trace each byte of data to its origin for AI.
This would help to identify any biases that may be hard to detect in the black box of AI and prevent the false propagation of data that may come from a “malicious” AI.
Think about it like a verification checkmark. If the AI has a checkmark, then it’s valid. If not, then there is reason to doubt its legitimacy.
As shown on blockchains like Polkadot, it’s also entirely possible to have organizations vote on certain rules and mechanisms on-chain. A similar concept can be done for AI, where votes can take place to determine various factors regarding its legitimacy, the integrity of data, and more.
After all, these models are only as good as the data is fed to them.
Over time, data could become convoluted. Who controls the source, what's to say that the source changes into something that could be harmful?
Granted, OpenAI does have its Moderation API - another AI that detects things deemed as harmful, which is a very valuable step in the right direction.
However, even for factual evidence, i.e, history, Internet-based data needs to be vetted and checked many times over.
As more of the public rely on these services, ensuring reliable information will be crucial.
There's no doubt that AI will change the world. With ChatGPT, it showed the public just how this technology could change their livelihoods overnight.
Ensuring the integrity of the AI is the next step. Verifying the data that comes to it, who developed it, and its exact motives/goals will be crucial in keeping the ethical standards, and in turn, public confidence in such models.
It’s really starting to feel like web3 now!