paint-brush
Is Selling Your Data Worth It?by@edemgold
380 reads
380 reads

Is Selling Your Data Worth It?

by Edem GoldJuly 26th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In this article, the author discusses the growing importance of data in the age of AI and compares data to oil in terms of its value. The rise of AI capabilities has increased the demand for data, leading companies to seek more data to fuel their AI algorithms. The article explores the theory of selling data and its potential dangers, including loss of leverage, financial dependence on companies, surrendering of legal rights, and disruption of market dynamics. Instead of selling data, the author proposes alternative solutions, such as data ownership and control through data vaults and data portability, enhanced privacy measures like government-backed regulations and privacy-enhancing technologies, adopting transparent data practices, and collaborative data approaches like federated learning and data trusts. The goal is to find a safe and responsible way to transfer data to companies for AI development without compromising individual rights and privacy.

People Mentioned

Mention Thumbnail
featured image - Is Selling Your Data Worth It?
Edem Gold HackerNoon profile picture

“Data is the new Oil” -Clive Humby


When Mathematician Clive Humby said Data is the new oil, he meant that data could become as rare and valuable to the running of the world as Oil. With the increase in popularity of Large Language Models like ChatGPT and Bing, data has become more useful than ever.


But the astronomic value is where the similarities end between data and oil, where oil is gotten from the fossilization of decaying plants and dead dinosaurs, data is obtained from the sum of human interactions with various technologies; websites, platforms, apps, etc.


As has been established, Data is incredibly useful, and will only become more so, due to the increased dependence on AI models but a bottleneck with data as a resource is what has come to be known as the Data Transfer Problem.


In this article, I hope to explain the growing need for data, the dangers of selling our data, and alternatives to that approach that will ensure we enjoy the positives of data as a resource rather than the negatives.


How I hope to approach this

The article will be divided into 3 sections.


  • The Data Explosion: In this section, I will describe data and the data transfer issue, explain why AI companies require data to fuel their AI algorithms, and how this has led to an increased demand for data.


  • The Theory of selling data: Here I will present the dangers of selling our data to companies so they can continue to build AI models and remain competitive.


  • Alternatives to Selling Data: Here I will present alternative solutions for how we can responsibly transfer our data to these companies, solutions which will be better for us, in the long term, than selling our data to companies.


Let us begin!


The Data Explosion

To fully understand the reason why there’s a growing need for data amongst companies, we need to first understand what data actually means.


Data is a word that has become commonplace in today's society, with so many reports of data leaks, the inappropriate collection of data by big tech companies, and so on.


Data is information that is collected and stored in a format that can be processed by a computer. It can be in various forms such as numbers, text, images, and videos, and it can be collected, stored, and analyzed to extract insights and inform decisions.


In simple words, data is the sum of human interactions on technology, technology here means platforms, websites, web apps, mobile apps, etc.


For example, When you scroll through Twitter or Instagram, you generate data. When you watch, like, and share a YouTube Video you generate data.


Now we know what data actually is, the reason for its increase in value is due to two interdependent reasons; The growing need for Data by companies and the rise of AI capabilities. Let’s pursue these further.


The Rise of AI Capabilities

To fully understand how the Rise of AI capabilities has affected the increase in demand for data, you must first understand how AI systems work. On that note, here is a quick high-level, non-technical, overview of how AI systems work.


AI works by learning from data, this process of learning from data is called Machine Learning. Modern AI systems, like Chat GPT, Bard, etc make use of a certain Machine Learning Technique called Deep Learning.


Deep Learning is a subset of Machine Learning which makes use of Artificial Neural Networks to discover and understand complex patterns within data here is the crux of the matter, deep learning powers almost all of what every member of the public comes in contact with as AI from the Twitter Algorithm which recommends Tweets to the filters we see on Snapchat.


From the explanation above, it is easy to infer that Deep Learning is what powers all of the amazing AI applications and the more data deep learning algorithms have access to the more efficient they are and this gives rise to a direct relationship between AI capabilities and Data Demand in the sense that an increase in AI capabilities/functions will lead to a proportionate increase in Demand for data.


The Growing need for data by Companies

The ability of AI to perform repetitive tasks, process large amounts of information, and mimic human-like decision-making means it’s in large demand by companies.


According to a survey by NewVantage Partners, 91% of leading companies are investing in AI companies in order to stay competitive.


The competitive advantage offered by AI to firms in data analysis, ads, customer insights, etc leads to a very profitable lead against competitors and the firms with superior AI systems are statistically prone to gather larger chunks in the market when compared to their competitors and as we have established in “rise of AI capabilities” section an increase in AI capabilities.


The Theory of Selling Data: A Controversial Approach

We’ve established that data is crucial to the development of AI and the continued competitiveness of companies. In this section, we are going to speak about the data selling theory.


The Data Selling theory is essentially an approach that states that since the personal data of individuals is being used to build AI systems and applications then people should be paid for it.


In other words, the foundational hypothesis of the theory is the belief that if companies use data to power their AI algorithms then why aren’t the people responsible for this data should be compensated?


After a shallow look, this approach seems sound, I mean generating data is something we human beings already do on a daily basis by scrolling through Twitter or Instagram.


So it wouldn’t require any extra effort on our part and who wouldn’t like a quick buck? I mean, Psychologists pay test subjects, right?


However, after a deeper analysis, the dangers of this approach become apparent. Selling our data has long-term consequences that far outweigh the short-term benefits and we are going to cover those consequences.

Loss of Leverage

The exchange of data for monetary compensation takes away our ability to check and balance these companies.


When we, as users, receive payment from companies for data generated, we lose the ability and incentive to hold them accountable for how they make use of our data because we have essentially handed over our ownership to the data and this has taken away the moral and legal leverage we have to hold companies accountable.


And the Leverage we have over these companies is crucial to holding them accountable to ensure their applications do not constitute as harmful to as as users and human beings.

Financial Dependence on Companies

Receiving payment for our personal data, in the long run, will result in financial dependence on these companies. Imagine a future where people come out of college and become professional Data generators for companies.


It creates a power imbalance because when these companies have control of our source of livelihood, they can easily control our incentives and as a result negatively impact our privacy, freedom, and democracy.


Imagine a future where a company can threaten to cut you off from your data generation job unless you vote for a particular political candidate or better still they use your data to create ads specifically targeted at you and use said ads to make you perform their bidding.

Accepting payment for our data will undoubtedly require us to sign over our, constitutionally guaranteed, Rights to these companies.


This makes us uninvolved and powerless to stop them from using the data generated for harmful reasons.

This is particularly dangerous when we think about the amount of power granted to these companies.


Essentially, we are thrusting full control of advanced synthetically intelligent systems to companies whose sole motive is to make a profit, this can have extremely malicious consequences.

Disruption of Market Dynamics

As we’ve established earlier, AI systems get better the more data they are given, then it stands to reason that companies with the most amount of data will have the best AI systems (assuming an equal distribution of technical talent).


Now if data is being bought then, per the laws of demand and supply, the companies which have the most money will have the better AI and as a result more market share.


Excluding the barring of new entrants, centring technology as powerful as AI in the hands of a few companies can have disastrous consequences on society. For instance, these companies would determine if AI should be implemented in Finance and defence rather than education and healthcare.

Alternatives to selling our Data

We’ve covered why selling our data to companies is bad. But the data transfer dilemma is a serious one that needs to be solved safely for the sake of humanity taking full advantage of AI and no problem has ever been solved by ignoring it.


On that note, in this section, we will discuss safe ways to solve the data transfer problem.


It should be noted that the ideas in this section will be shared under collective categories which contain interconnected ideas.

Data Ownership and Control

People should have the full liberty to own their data and make use of it as they deem fit. This can be done through the use of Data Vaults and Data Portability.


  • Data Vaults: Think of Data Vaults as data storage where users can safely place their data and trust that they can’t be accessed by companies or anyone else without consent.


  • Data Portability: The idea of data portability is closely related to data vaults. It is essentially a system which enables users to transfer their data from one system to another. It enables users to transfer their data from certain platforms to data vaults.

Enhanced Privacy Measures

In order for safe data transfer to occur between users and companies, there have to be measures set in place to protect users and companies. We will discuss some of them below.


  • Government-Backed Regulations: Government-Backed regulations will go a long way in promoting safe data transfer between companies. Regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act are examples of government-backed regulatory measures which protect the rights of users against companies.


  • Privacy Enhancing Technologies: Ensuring Companies adopt privacy-enhancing technologies like end-to-end encryption and homophobic encryption will help aid the protection and privacy of users’ personal data thereby ensuring trust in the Data Transfer problem.
  • Anonymization Techniques: Techniques like data anonymization make it possible for users to allow companies access to their personal data while preserving personally identifiable information like their names, addresses, etc

Adopting Transparent Data Practices

In order for safe and positive data transfer to occur, companies must adopt consensual and transparent data practices. Let us discuss some of them.


  • Informed Consent: Before making use of users personal data, companies must seek permission and users must be aware of what their data is being used for in order to make educated decisions when tasked with the decision to share their data.
  • Transparent Operational Policies: Companies must be mandated to provide clear and easily understandable policies on data collection, processing, and usage.
  • Reformed Data-Centric Business Models: New and emerging business models which support privacy and compensation for individuals, without requiring them to transfer ownership should be created.

Collaborative Data Approaches

Novel data collaboration techniques need to be adopted and become industry standard. Some of these techniques will be expanded upon below.


  • Federated Learning: Federated Learning is a technique used for training Machine Learning Models. Rather than companies having to gain and store user data in central servers before using them to train AI models.


    Federated Learning makes it possible to train AI models on decentralized devices. It makes it possible for devices to exchange data with each other without requiring the data to be sent to global servers so companies can never store your data but can use it to build AI models.


  • Data Collaboratives: Data Collaboratives is a new relatively new collaboration model where companies, researchers, and organizations (like schools) can share their anonymized data together in order to use for training AI algorithms in an effort to recycle data.


  • Data Trusts: Data Trusts are organizations that serve as intermediaries and hold onto users’ data in an effort to ensure safe and ethical use by companies.

Summary

We have come to the end of the piece! Data is the lifeblood of AI and AI is crucial to the growth and continued sustenance of humans as a species. The Data Transfer problem is a dilemma we need to solve in order to take full advantage of AI systems and I hope this article has helped a bit in clarifying that.


Also published here.


The lead image for this article was generated by HackerNoon's AI Image Generator via the prompt "people selling data".