Who Will Eventually Control Big Data in Web3?

Written by web3tales | Published 2023/05/11
Tech Story Tags: web3 | data-privacy | data-engineering | big-data | database | facebook | decentralized-internet | decentralized-web

TLDRAs the internet moves closer to the era of decentralization with Web3, it's important to understand clearly how data control will be affected. Web3 appears to offer a safer online environment with improved data privacy and security. Although adoption is still in its early stages, it is crucial to prepare for what to anticipate in the wake of the next wave of the 21st-century internet.via the TL;DR App

While Web3 is screaming decentralization, the likes of Facebook, Amazon, and Google are hoping it's just a false alarm.

As the internet moves closer to the era of decentralization with Web3, it's important to understand clearly how data control will be affected and the likely turn data engineering will take.

The internet has welcomed many stages of evolution since its inception with the internet moving from being merely a news board to a place for interaction through the medium of social media.

As human experiences are being gradually improved upon, there are concerns about how much data one can control in the bid to enjoy personalized services.

What will be the fate of the numerous tech giants that depend on vast quantities of user data to offer personalized user experiences, which is undoubtedly a guaranteed strategy for success?

This is the best time to quote Aaron Swartz briskly:

Information is power, but like power, there are those who want to keep it to themselves.

Web 3 appears to offer a safer online environment with improved data privacy and security. Although adoption is still in its early stages, it is crucial to prepare for what to anticipate in the wake of the next wave of the 21st-century internet.

This is a tale of the data control dilemma with the encroachment of decentralization and loudly preached transparency.

Let the tales begin!

We will start with the basics.

What Is Web3?

Web3 is the next stage of the internet evolution.

Powered by technologies like the blockchain, decentralized protocols, and cryptographic systems which store data using cryptographic algorithms to ensure the data is secure, Web3 moves from the era of a centralized web as seen in Web2 to that of a decentralized web.

Still in doubt? Listen to Aya Miyaguchi, the Executive Director of Ethereum.

Web 3 is about creating a more decentralized and open internet where users have control over their data and interactions, without relying on centralized intermediaries for access and validation.

What Are Cryptographic Algorithms?

Imagine you have a secret message to send to your friend. You want to send it through the internet, but you're not sure of the safety of that message.

You don't want anyone else to know what it entails, so you decide to use a special code that just you and your friend can comprehend to write the message.

Based on the level of your friendship, you may decide to twist some words to mean something else that can only be decoded between you and your friend.

That's where cryptography comes in. Adding an algorithm would be including a special set of rules to your codified message to enable your friend to understand it.

Now, let's get a definition.

Cryptographic algorithms are rules that turn secret messages into secret codes that only the parties involved in the interaction can understand.

Privacy is a fundamental pillar of Web 3, and cryptographic algorithms used to store data on the blockchain ensure the security and confidentiality of personal information.

By prioritizing privacy and security, Web 3 empowers individuals to take control of their data and protects them from the invasive practices of centralized entities.

What Is a Decentralized Web?

This term describes an internet environment that doesn't rely on a single entity like Google, Facebook, or Yahoo for control.

Instead, it uses blockchain technology to enable direct peer-to-peer interaction between users- all the computers and devices are equal and can communicate seamlessly.

In the Web 3 world, we don't need middlemen to trust each other. We can trust the system itself.


- Joseph Lubin, Co-founder of Ethereum & CEO of ConsenSys

Imagine walking into a room and interacting with everyone without hassle, that's what peer-to-peer feels like!

Web3 Is Coming…

Thanks to the early adopters of Web3, there are gradually more promising Web3 alternatives for day-to-day tasks that pledge to provide better experiences in terms of data security and speed such as:

  • Brave for browsing

  • Metamask to send money

  • Opensea for buying and selling

  • Mastodon to have more control over your data while still enjoying the benefits of social media.

  • Filecoin to store and manage data rather than uploading to your Google Drive


Data storage already has Web3 alternatives. Yeah! An anonymous person source says, "Whoever controls the data, controls the truth." Rightly so!

Web3 is giving more control to individuals so you can have a stake in decision-making, and that's how life should be! Data is a valuable asset at times termed as gold, especially in this present era where data is used to improve user experience and help businesses make decisions that lead to growth.

The principles of centralized data management nullify the need for democracy and transparency when dealing with data in the Web2 space.

Centralization is not bad

We have all benefited from personalized services like having Amazon suggest what you might like based on the last thing you searched for or even bought.

Web3 is a new layer of the internet that largely leans on the efforts of centralization to make data more accessible and secure.

Decentralization doesn't always mean private

Like many newbies in this ecosystem, it's easy to take on the thought that decentralization guarantees privacy. No, it doesn't.

"Many decentralized applications (dApps) and smart contracts built on decentralized networks like Ethereum collect and store data in transparent, publicly accessible databases."

Too Many Broken Trusts

Many times we have been forced to trust tech lords like Facebook with our data, and oftentimes they keep breaking it.

Take the Facebook Cambridge Analytica scandal in March 2018 where millions of Facebook users had their data collected without their consent for political advertising reasons.

It was quite easy for Cambridge Analytica, a British political consulting firm, to obtain tons of people’s data through a third-party app - Your Digital Life.

The app helped the firm obtain personal details of individuals who signed up as well as details of their friends on Facebook all without their consent.

After that incident, which had over 87 million users' data hijacked, it became more glaring to the public that Facebook owns and controls the data of everyone on its platform. Till now, users have little visibility on how accessible their personal data is.

Web3 exists to prevent such scandals by giving you the power to control your data. It also promises you that your data is safe by eliminating third-party centralized networks for data storage which are vulnerable to attacks.

Talking about centralized networks/servers, I think this is the best time to bring in the discussion on the synergy between big data, data engineering, and centralized servers.

Data Engineering & How It Plays Out in Web2

Data engineering refers to the process involved in building an infrastructure with tools such as Apache Spark, Hadoop, and Cassandra to collect and transform big data for the benefit of business use which will result in growth.

Big data, on the other hand, refers to large amounts of data- structured, unstructured, and in some cases, semi-structured, generated daily through the internet by individuals and businesses.

The importance of data engineering cannot be overlooked- tons of data (big data) are constantly collected, harnessed, and processed through pipelines to provide valuable business insights and decisions.

Now, a major limitation of this is that data engineering relies on centralized systems for data storage hence, is prone to a single point of failure - centralized networks have data processed and stored by a single node; in a case where the primary server crashes, there would be a general downtime which could lead to data loss.

Also, individuals/businesses have little or no control over their data, and virtually every business now employs centralized servers.

What do they say about times and seasons? Well…

Here are some limitations of centralized cloud computing:

  • Users' data can be stored and used without their consent. In the place of Google, personal data such as search history, personal preferences, and also location can be arbitrarily used.

  • Data is prone to hacking by third parties just like the Facebook scandal.

  • The operations will be slower leading to stifled scalability- following the dependency on primary servers(less division of labor).

  • Servers are vulnerable to a single point of failure as explained above.


Is that the path data management should continue in? I don't think so. There seems to be a way out -thankfully, with the "decentralized mandate" on Web3, multiple nodes can process and store large amounts of data.

Now, let me help you understand this.

Decentralized storage protocols can be likened to keeping your eggs safe in different baskets- all in the same room each stored in a well secure basket that can only be accessed with permission (on-chain).

Centralized storage servers can be likened to keeping all your eggs in one basket relatively safe. The safety of the eggs can be easily tampered with when it's in one basket. That's how vulnerable centralized servers are.

Notice that I highlighted that, in the first case scenario, these eggs are in the same room, so there's still a connection between them - that explains the interconnection between decentralized nodes.

Just like the eggs in scenario 1, all placed in different baskets interconnected nodes or blocks are more secure and easy to detect when something goes wrong.

But then, there's still a problem.

Navigating the Early Stage of Innovation Illuminates a Myriad of Challenges

The emergence of true data sovereignty in a decentralized environment has uncovered many holes that should be filled.

At this current stage where Web2 is still fully in existence, many applications utilize centralized servers including Web3 dApps because there aren't yet fully equipped decentralized storage protocols to handle massive data yet.

Web3 developers crave a more sophisticated decentralized database environment following the massive and complex nature of data generated in the ecosystem. Let's face it; with millions of transactions and blocks added daily, developers need to access and query data in real-time.

Speed is of necessity for dApps to scale efficiently.

Here are some of the problems faced by developers:

  • Some existing decentralized storage protocols like Arweave and IPFS are limited to the storage of only static files.

The abundance of Web3 applications means more complex data to deal with like NFT data, tokens, wallets, etc.

Following the dynamic nature of Web3 data, these distributed protocols make it difficult for developers to easily query data, leaving most of the data stored in silos unused. Also, such a rigid storage environment slows down deployment.

  • These vast amounts of data lying fallow in silos, which are not maximized are kept to waste because they can't be easily accessed when needed - in real-time. This affects the application lifecycle.

  • Cost implications: The time and expense required to build a robust database infrastructure to tally with the demands of Web3 complexities will give Web3 developers a second thought to continue with centralized databases which they are even more familiar with. It is a complex process to build from scratch, and there is scarce time to do that while trying to scale.

  • One of the pillars of a Web3 environment is trust. Developers need to trust that they can get reliable data from the emerging decentralized storage infrastructures available.

  • Using blockchain specifically to store data takes a lot of time and isn't any different from using centralized servers; transactions are slower and take longer time to validate. In a bid to solve this, developers might resort to building a decentralized on-chain database. While this would be more effective, it is an extremely complex task to get started with.

Who Owns the Data in Web3?

Various Web3 projects may have distinct approaches to data ownership. However, ownership of data is distributed across nodes or computers to enable transparency, censorship resistance, and immutability.

This distribution is carried out by decentralized systems including blockchain technology.

Individuals have full control of their data and can decide which data they want to share.

How Is Data Stored in Web3?

Data storage in Web3 is usually carried out with decentralized technologies like the blockchain, distributed storage networks, and decentralized file systems like (InterPlanetary File System)

Now we will take them in bits so you can understand:

  • Blockchain: A strong force to be reckoned with in the Web3 ecosystem that allows you to store data in blocks and link them securely through cryptography. (Remember the definition above)

It is usually used to store transactional data through an immutable chain, hence its name- blockchain.

  • Distributed storage networks:  This decentralized storage type leverages the collective storage ownership of participants in the network and incentivizes them to share more of their unused storage space. The storage space is then broken into bits and distributed across multiple nodes.

  • Decentralized File Systems (IPFS): A peer-to-peer distributed storage environment that leverages on content addressable model: a way to recognize files by their cryptographic hash. It helps in the storage and retrieval of files.

Which Web3 databases Are Available?

As Web3 is buzzing, many developers have discovered a missing tech stack in the ecosystem, a database for Web3; below are a few emerging databases you can use:

  • Polybase
  • Glacier
  • Ceramic
  • WeaveDB
  • IceFireDB
  • Kwil
  • Space and Time (Web3 data warehouse approach)

Thus Far…

Some Web3 databases have emerged to solve major pain points that Web3 developers face to scale their dApps efficiently. From the likes of a handful of examples listed above amongst others, their approach in ways is similar by providing:

  • Programmable composable data network: A better interconnected organized space for Web3 data that allows for easy access and allows developers to build on the infrastructure seamlessly.

  • NoSQL database:  With traditional databases rigid on structured data i.e., having rows, tables, and columns, it eliminates the chances to maximize Web3 data which comes in more complex forms. NoSQL databases will help handle Web3 data in any form with more flexibility and efficiency. Using the NoSQL database structure, different types of data can be stored without necessarily fitting into a predefined schema.

  • Modularity: Just like building blocks, different parts of the environment can be easily put together or taken out to meet needs per time- it's customizable and dynamic in nature.

  • Scalable infrastructure: A robust tool that meets the heavy data demands of massive Web3 data without getting overwhelmed or reducing efficiency.

  • Interoperability: This is one word you may struggle to pronounce but will regularly come across when talking about Web3 applications. I'll try to help you understand it in the best way possible 🙏

Imagine that you have a flash drive that can easily connect to 3 computers. That drive is able to share information easily with each of the computers.

That's how it works too in Web3- various dApps can share information with each other and work together even if the applications were built by different people.

It's more like a meeting point of agreement despite the uniqueness of the applications.


Emerging Web3 databases are receptive to allowing various dApps to interact and collaborate seamlessly on their platform. Business applications can interact without intermediaries.

  • Data monetization: Glacier particularly, intends to combine NFT technology with the ownership of Web3 data to bring to reality the monetization of Web3 data subsequently.

Exciting Times Ahead for Data Engineers

As Web3 gradually encroaches into the way everything on the internet is done, data engineering will also experience interesting steps towards decentralization.

Data engineers, the core architects of data pipelines, will have to learn new ways to design and maintain decentralized environments that meet the demands of data sovereignty which covers security, integrity, and privacy through immutability.

Also, businesses will garner more trust seeing that their data is protected from unauthorized access, deletion, or manipulation.

The smooth transition from Web2 to Web3 is also not leaving out Web2 applications, some of the databases listed above like IceFireDB, is providing a solution for Web2 and Web3 applications to access decentralized storage.

Automation is a one-time button for repetitive tasks. Data engineers can employ smart contracts - programmable self-executing contracts to automate some data management practices.

How Will Tech Giants React to the Full Unfolding of Web3?

Like many things that trended within a particular time frame in the past, there came a time they also had to fade because they weren't able to successfully withstand change. That's the same way this will play out.

Facebook, Amazon, and Google may not disappear entirely if they can adapt and implement the necessary changes to transition towards a more open and transparent internet.

Facebook has gradually shown support for certain aspects of the Web3 sphere, such as the metaverse and NFTs, despite its recent decision to shut down its NFT platform for Facebook and its sister platform, Instagram.

Everyone needs to learn constantly about the evolving direction of the internet because this next phase is really about you, and you do have a stake.

Additionally, there would be a more transparent approach to handling consumer data as a business, ensuring direct consent through a decentralized web. It's truly an incredible journey to anticipate, albeit at a leisurely pace.

Takeaway

  • Just as you wouldn't be needing another person's phone to make calls, Web3 developers crave an environment for decentralized storage that fully exceeds expectations rather than blindfolding themselves to the many limitations of centralized facilities.

  • Get excited about the new wave of the internet -Web3, where you are more involved and protected.


Written by web3tales | Enjoy Web3 easily without the jargons.
Published by HackerNoon on 2023/05/11