paint-brush
Databases and Blockchains, The Difference Is In Their Purpose And Designby@vincetabora
52,802 reads
52,802 reads

Databases and Blockchains, The Difference Is In Their Purpose And Design

by Vince TaboraAugust 4th, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There is much confusion as to what a blockchain is and its dichotomy with a database. A blockchain is actually a database because it is a digital ledger that stores information in data structures called blocks. A database likewise stores information in data structures called tables. However, while a blockchain is a database, a database is not a blockchain. They are not interchangeable in a sense that though they both store information, they differ in design. There is also a difference in purpose between the two, which is perhaps what is not clear to those who want to understand why blockchains are needed and why databases are better suited for storing certain data.

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coins Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Databases and Blockchains, The Difference Is In Their Purpose And Design
Vince Tabora HackerNoon profile picture

There is much confusion as to what a blockchain is and its dichotomy with a database. A blockchain is actually a database because it is a digital ledger that stores information in data structures called blocks. A database likewise stores information in data structures called tables. However, while a blockchain is a database, a database is not a blockchain. They are not interchangeable in a sense that though they both store information, they differ in design. There is also a difference in purpose between the two, which is perhaps what is not clear to those who want to understand why blockchains are needed and why databases are better suited for storing certain data.

First, let’s look at the difference between a database and a blockchain.

Database

A traditional database is a data structure used for storing information. This includes data that can be queried to gather insights for structured reporting used by entities to support business, financial and management decisions. Government also make use of databases to store large sets of data which scale to millions of records.

Databases started as flat file hierarchical systems which provided simple information gathering and storage. Later, databases used a relational model which allowed more complex ways of gathering data by relating information from multiple databases. The information stored in databases can be organized using a database management system. A simple database is stored in data elements called a table. Tables contain fields, which define the type of record, that store data called attributes. Each field contains columns that describe the field and rows which define a record stored in a database.

A database design

A database can be modified, managed and controlled by a single user called an administrator. The database always has a user that functions as a DB Admin and that user has complete control of the database. This user can create, delete, modify and change any record stored in a database. They can also perform administration on the database like optimizing performance and managing its size to more manageable levels. A large database tends to slow performance, so admins can run optimization methods to improve performance.

The admin can then delegate certain roles to other users that allows them to administer or manage the database. For example an admin can delegate a role to a user that allows them to create new users for the database. When something goes wrong, the admin and their delegates can restore a database from backup. In the corporate world, problems like this are common. Servers crash, and the only way to recover data is to restore a database from backup.

A database is also recursive, meaning that you can go back to repeat a task on a particular record and modify or delete it. Admins often purge old records in a database that have either been backed up already to another database or have been deemed obsolete information. An example is if you have a record for “John Smith” in a current database that needs to be updated to a new residential address. There is already a backup of previous addresses of “John Smith” in an archived database, so the record can be updated with the new address in the current database.

A Database Uses A Client/Server Architecture

A database is implemented in a client/server architecture from small office/home office to enterprise environments. This is because computers need to connect to the server that hosts the database in order to get information or store data. Originally, a database client using an ODBC connection was configured by an administrator or tech on a client computer to connect to the database. The client software then runs to establish a secure connection which must then be authenticated for access.

The authentication can be using an access string configured by an admin or users are given passwords to login to the database. This is why user accounts are created, to allow access to those who are authenticated and reject those who are not. In most systems today, a web interface is used instead. It still requires authentication for private access, while a public database can be more easily accessed from a website.

We can see that a database requires plenty of control, which makes it highly centralized. It is also permissioned, meaning that it requires user accounts from an administrator who then set privileges on how users can access a database. In a production environment, a DB Admin sets read only permission for public information on a database. They must then set a different set of permissions for users who can update and write information to the database. The centralization of a traditional database sets the security and trust in the system. Many databases run in private networks behind a firewall in data centers run by big companies. Others are hosted on the cloud available to the public. They still require an administrator to control them.

Blockchain

In my description of a blockchain, I am going to base it on the design used in Bitcoin. There are also private blockchains for enterprise environments, but I will discuss those later. Blockchains have only been around since 2009, when Bitcoin became the first system to implement it. In Bitcoin, a blockchain is an immutable digital public ledger that is a continuously growing distributed database that is cryptographically secured.

A blockchain stores information in uniform sized blocks. Each block contains the hashed information from the previous block to provide cryptographic security. The hashing uses SHA256 which is a one way hash function. This hashed information is the data and digital signature from the previous block, and the hashes of previous blocks that goes all the way back to the very first block produced in the blockchain called a “genesis block”. That information is run through a hash function that then points to the address of the previous block. A blockchain data structure is an example of a Merkle Tree, which is used as an efficient way to verify data.

A blockchain design

In order for blocks to be added to a blockchain, game theory is involved in the process. Computers that function as nodes in the network called “miners” must compete with one another to find a value produced from the hash function called the nonce. The miners must use their compute resources to solve this value, and this requires powerful computer hardware. A protocol built into the blockchain called a difficulty level determines how hard or easy it is to solve the value based on what is called the total hashing power in the network.

This means the more miners there are, the more the difficulty level of adjustment becomes harder. This is because with more miners, there are more computing resources available on the network that increases the hashing power, measured in hashes per second or h/s. Once a miner has validated a block, they will receive a reward as an incentive for providing their compute resources to the network. The incentives are the motivation for nodes to mine blocks since they get rewards in the form of transaction fees and coins. In the Bitcoin protocol, this is called a Proof-of-Work consensus algorithm.

As you can see, a blockchain uses a distributed network of nodes that is decentralized. Decentralization means that all nodes on the network store a copy of the blockchain. The nodes either store a full copy (full nodes) of the blockchain or perform mining operations or they can do both. There is no administrator to validate a block of transactions. Instead you have miners that perform this verification by solving cryptographic puzzles based on a difficulty level proportional to the total network hashing power available.

Once the block has been added to the blockchain, the information is immutable and transparent to all. Blockchain transactions are non-recursive, meaning they cannot be repeated once validated in a block. A blockchain is highly fault tolerant since if one or more nodes are down, there will always be other nodes available that will run the blockchain. Another advantage of decentralization is that it can be permissionless and trustless, allowing people who don’t know or trust each other to transact. What the blockchain does is provide that trust through transparency by recording the transaction and providing a cryptographically secure way to exchange value.

Blockchains Use A P2P Peer-to-Peer Network Architecture

A blockchain uses a peer-to-peer or P2P network architecture. It does not require access to a centralized database, instead all participating nodes in the network can connect with each other. There is no “master” that controls all nodes. Each peer is equal to each other in how they access the blockchain without requiring an administrator access.

So what if a peer has gone rogue, can they influence the network?

The answer is in theory they can if they have the majority of hashing power. A rogue peer in theory can control the network using what is called a “51% Attack”. It requires an immense amount of computing resources to pull off as it becomes more costly to launch an attack than it is to not attack the network. Mining is more profitable in this case for using computing resources. A mechanism in blockchain that makes it secure is decentralization. If a peer tries to modify any information on a blockchain, it will require support from other peers to validate it. This creates a separate chain from the main network and it only becomes valid if it is longer than the main network’s chain.

A network like Bitcoin contains thousands of nodes, so trying to manipulate data will require changing it on all the other nodes in the network. In reality this is computationally intensive and requires expending vast amounts of electricity, it will be extremely difficult to do, and expensive. This is a form of governance to make sure that no one tries to cheat anyone. This is what makes blockchains tamper resistant and immutable. At the same time it is transparent since there is proof that a transaction occurred which everyone can view.

We can clearly see that the design is what makes a traditional database and blockchain different. Let’s summarize some of those features.

Database vs. Blockchain — Main Features

Now let’s point out the advantages and disadvantages of each.

Advantages To Why We Use a Database

Customizability for User Friendliness

Traditional centralized databases can be customized by the administrator depending on business requirements. It can also be distributed to many locations in which the data can be merged to a master database for query and reports. They offer robust features that allow developers to create applications to give users a more consistent and user friendly interface.

Stability

When properly managed, a database system can handle large volumes of data and process thousands of transactions per second. They are also fast because databases, since they are permissioned, grant access to write operations only to a select few and the data is recorded to a few servers but the information can be made available to many users. It doesn’t run on many nodes, it just requires a powerful server to process data at the backend while a frontend host provides an interface. Speed in databases can be optimized through hardware using RAID Level 1 and through other techniques like sharding and shrinking. In the event of a disaster, an administrator can roll back changes as well. All manner of updates and security are handled by the administrator, who manages the entire system.

Transaction Speed and Volume

Today’s databases are designed for both high volume transaction processing and data analytics. This means they are tried, tested and true for mission critical operations in enterprise production environments.

Advantages To Why We Use a Blockchain

Decentralization

A decentralized system is highly fault tolerant. If a node crashes on the Bitcoin network let’s say, it doesn’t bring the entire system down. There are other nodes on the network that run the blockchain. Decentralization also adds more security since the information stored on one computer must be copied to all nodes in the network. This means if a node were compromised, a hacker would need to be able to change the information on all nodes to manipulate the data. This has proven to be a good safeguard in deterring attacks against the system.

Immutability

A blockchain stores information that becomes immutable, meaning it cannot be changed once a block has been validated. This also makes it resistant to tampering and manipulation because the information is recorded on a digital public ledger stored on many nodes. To compromise it means to change that information in all the nodes on the network.

Transparency

A key feature of blockchains that provides a benefit to business is transparency. This makes everything recorded on the blockchain censorship resistant. Information about a transaction cannot be hidden so this creates more trust and adds value to the system. Using the blockchain requires no permission from anyone, it is an open platform for all in a public environment.

Security

Since blockchain use advanced cryptographic technology and a distributed decentralized network, they offer a secure environment. Modifying data on a block requires expending plenty of compute resources. It also is not ideal because it requires changing the data on all nodes on the network. This is what deters attacks since it is more costly than mining blocks for rewards. This is a feature to help protect the blockchain from rogue miners and hackers.

The Problems With Databases

Single Point of Failure

Since it is centralized, there is one point of failure. The data is in the hands of single entity or group, so there is no way to guarantee it is being used for the right purpose, as in the case of data from social media winding up in the hands of bad actors. A company that has control of information can monetize it for third party use, but sometimes it is not in the best interest of users. When a database gets hacked it is also another issue since it can affect many users information. When a database server fails it also affects the entire system. If there is no backup of the information stored on the database, then there is no way to recover valuable data. This is why failover and redundancy is important in centralized systems.

Administrator Account

Since a database requires an administrator, if the password is lost it becomes harder to recover a database. If the DB Admin has no delegate administrator that has privileges to a database management system, no one can create new databases or modify existing databases. Another problem with this is when a DB Admin leaves the company, it becomes a very tedious process having to reset passwords and elevate the privilege of a new administrator. It is likely someone might forget to change a password or remove certain privileges or delete the account of former employees who have access to the database. This is something MIS departments have to deal with in order to keep their information secure.

Security Issues

In a centralized system, if the administrator forgets to apply patches and updates, the system can be vulnerable to security exploits by hackers. This makes databases prone to breaches. Centralization should make management simpler, but at other times when not properly done it can cause very critical problems that affect data integrity in a system. Trusting all our information to one company is the norm, but it can become a problem if the company does not adhere to best practice in information security. Hacks have already affected many major companies, and data breaches are becoming more common as information is a valuable asset. This is why third party audits and strict regulations are in place for data security involving production databases.

The Problems With Blockchains

Energy Consumption

First and foremost, the compute resources to run a blockchain like Bitcoin expends large amounts of electricity. This is part of the protocol required to process transactions in the Proof-of-Work algorithm. All the energy is used by the miners in order to solve cryptographic puzzles to validate blocks. The amount of energy consumed increases with the level of difficulty increase that is related to more hashing power from compute resources. The more nodes you have mining, the greater the computational effort required to validate a block of transactions. This requires plenty of energy consumed. The whole Bitcoin network has been estimated to consume the same amount of electricity as small country like Haiti or Denmark.

Scalability

Blockchains do not scale well when it comes to high volume transactions. Due to the fixed block size, there are problems with increasing transaction volume. The delays also affect transaction velocity, where most blockchains cannot process more than 15 transactions per second. Scaling solutions have become the focus of many projects to optimize performance to handle more transactions and increase processing time. If claims to 1 million transactions per second are proven on the blockchain (not yet as of this writing) then that can significantly disrupt the rest of the industry.

Size

An issue with most databases, including blockchains, are their size. When they get bigger, they consume more space for storage and this makes them slow down. Bitcoin’s blockchain is already > 100 GB, while Ethereum’s blockchain size has surpassed 1 TB (as of this writing). It’s not just a storage issue for nodes, but a network as well. With larger blockchain sizes, it takes much longer to copy them to new nodes on the network. It can take several hours to days depending on the network bandwidth. The larger blockchain size requires more bandwidth to transmit to another node. This affects new nodes or nodes that go back online and have not been updated in a long time.

High Transaction Fees

The fees to process transactions are another issue which Bitcoin faced. Whenever the demand is high, transaction fees also go up to benefit miners. Keeping transaction fees low or removing transaction fees is a challenge for blockchain designers. With high transaction fees, users are deterred from using the network. When scaling issues solve the problems with transaction speed and volume, more reasonable fees should be applied.

Interoperability

This is currently an issue since, unlike traditional databases, each blockchain is very much its own ecosystem. There are protocols that aim to make blockchains interoperate with each other. For example, to allow users to transfer value from Bitcoin to another blockchain like Ethereum requires the use of a digital exchange. Developers are finding ways to make dissimilar blockchains interoperable to make the transfer of value much simpler.

Best Use Cases For Databases and Blockchains

Databases are best for enterprise networks because of their stability. They are also more user friendly to users and have many supported management systems for administrators and developers. The top 500 companies on Forbes make use of databases that run high-end systems that deal with large volumes of data. Databases can scale to millions of records and process thousands of transactions per second very easily. For systems that deal with high volume traffic, like retail, a database is still the best solution. The stock market is better off with a database that can quickly store information and allow instant retrieval without the need for miners to validate the data. A blockchain does not need to store large amounts of numerical data used in analytical processing. A database can store this data much better and process it faster as well since it does not require multiple nodes to run each piece of data. You also don’t need to encrypt or hash every piece of data you store in a database. By default, databases are unencrypted because encryption adds a lot of overhead in a live database. Being permissioned is the security feature in a traditional database. A database that is archived can be encrypted however.

Databases have proven their reliability for storing information and providing quick queries to retrieve data for reports and analytical purposes. Unstructured data is another thing that does not require a blockchain, these are more suitable for database management systems. Data that does not need to trust verification to be used, like the number of items sold by a store at the end of the day are best recorded on a database. It is also more costly to use a blockchain for something as simple as private bookkeeping information, since a standalone database is more efficient. Personal information that only a certain company needs to know like social security and medical records are best stored in databases. This information can be used with public verification systems that can rely on a blockchain. The personal information can be obscured but verified on the blockchain based on public key cryptography.

A Database Is Ideal For:

  • Data that need continuous updating, like monitoring and sensors
  • Fast Online transaction processing
  • Confidential information (non-transparent to the public)
  • Financial data from markets that require fast processing
  • Data that does not require verification
  • Standalone applications that store data
  • Relational data

The requirements for blockchains are to establish trust and transparency. It is simply a digital public ledger which allows everyone access to information. In this case it can help with validating information from B2B Business-to-Business transactions related to supply chain, distribution and inventory. Transparency can help with industries like advertising to minimize fraud by building more verification of an advertiser’s company and the source of ad spends. Blockchains while not for large scale data records can be implemented more for validating information. Bitcoin is the first successful implementation of a blockchain, and it works well as a system for transferring value and validating payments in transactions. Bitcoin’s success is that it also addresses the double spend problem in digital payment systems that would have allowed users to spend the same coin more than once. Bitcoin implements a protocol that validates transactions using confirmations based on a chronological order with timestamps and the user’s funds that are available. This helps to prevent double spending by not allowing the system to process transactions simultaneously, they will always be done in chronological order.

Some projects are exploring blockchains for permissioned systems like those used in voting stations. It makes a lot of sense on paper since a blockchain can verify both the identity and the vote made by a person. The purpose is to prevent cheating, so blockchains really aim to enforce fairness in trustless and permissionless systems and likewise in trusted and permissioned systems as well. In the case of the latter, some blockchains don’t require cryptocurrency or mining, like in enterprise blockchains. These are a new class of systems that use blockchain technology in a private and permissioned environment, and sometimes integrated with databases to form a hybrid system.

One thing long time database administrators will notice is that blockchains are non-relational. You cannot create joins on different blockchains and relate data. This is a major difference between the two, so when information needs to be relational a blockchain will not be suitable for it.

Other blockchains implement what are called “Smart Contracts” like on the Ethereum network. These are much like using stored procedures in a database, in which triggers can be used to execute code to process a transaction. In Ethereum’s network, a smart contract execute as bytecode on all nodes in the network. Ethereum and other cryptocurrency like EOS and NEO use blockchains as a platform for their smart contract ecosystem. This is another example of how blockchain use can differ from traditional databases.

A Blockchain Is Ideal For:

  • Monetary transactions
  • Transfer of value
  • Verification of trusted data (identity, reputation, credibility, integrity, etc.)
  • Public Key Verification
  • Decentralized applications (DApps)
  • Voting systems

There are many other things to discuss about databases and blockchains that were not discussed due to their broad range of topics. What I present here are just some of the facts and observations, from a technical perspective. In the end it is not that databases are better than blockchains or blockchains are better than databases. They both have their purpose and how they are used depends on what you want to do with your data.

_________________________________________________________________

Note: This article is based on the author’s research and knowledge of database and blockchain technology. Please, do share your own views regarding this subject matter in responses, thank you.