paint-brush
Why We Should Have Different Databasesby@artemg
601 reads
601 reads

Why We Should Have Different Databases

by Artem GoginMay 13th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There are only a few types of databases that can achieve mostly the same goals. In-memory databases can provide real-time response for selecting and inserting records. Document-oriented databases allow storing records in a document-oriented format. Columnar databases store data as key/value records on HDD or SSD. These solutions are designed to scale well enough to manage petabytes of data across thousands of commodity servers in a distributed system. They represent the SSTable architecture, designed for two use cases: fast access by key and fast writing.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Why We Should Have Different Databases
Artem Gogin HackerNoon profile picture

Today there are hundreds of SQL and NoSQL databases. Some of them are popular, some are ignored. Some are user-friendly and well documented and some are hard to use. Some are open-sourced and some are proprietary. And, perhaps, the most important - some are scalable, optimized, highly available and some are difficult to scale or maintain.

There comes the natural question: how to choose a database? To answer it, we should decide, what we want to achieve with a database. To create a view, we should answer questions like these:

  1. Do we need analytical access to the database?
  2. Do we need real-time writing or reading?
  3. How many tables/records we want to store?
  4. What availability do we need?
  5. Do we need columns?
  6. Will we access tables with filter by column or with filter by row?

When the decision is done, we need to keep in mind, what one or another database able to offer. Particular features of each database may vary, but in general, there are only a few types of databases. Within these types, we can achieve mostly the same goals. Let’s look at them closely.

1. SQL relational databases.

If you have ever worked with databases, most likely you have begun with this type of database. This type is the most popular and widespread. These databases allow storing data in relational tables with defined columns of a particular type. Relational tables are good normalization and joins.

Advantages

  • SQL support
  • ACID transactions (Atomicity, Consistency, Isolation, and Durability)
  • Indexing and partitioning support

Disadvantages

  • Poor support of unstructured data / complex types
  • Bad optimization for event processing
  • Difficult/expensive scaling

Examples: Oracle DB, MySQL, PostgreSQL.

2. Document-oriented databases.

If we don’t want to join several tables to retrieve desired data, we can look at the document-oriented databases. These databases allow storing records in JSON-like format. With this format, we can create complex value for any key and include all the data structure in one record at once.

Advantages

  • Schema free
  • No need to always write all the fields in every record
  • Good complex types support
  • OLTP suited

Disadvantages

  • Poor transactions support
  • Poor analytics support
  • Difficult/expensive scaling

Examples: MongoDB

3. In-memory databases.

Databases of this type can provide real-time response for selecting and inserting particular records. Most of them mainly store data into RAM but also offer persistent storage on HDD or SSD for some cases. Most of these databases operate with key/value records, so the values may recall document-oriented format. But some databases also operate with columns and allow secondary indexing in the same table. Using RAM allows to process data rapidly but makes it more unstable and expensive.

Advantages

  • Fast writing
  • Fast reading

Disadvantages

  • Difficult reliability
  • Expensive scaling

Examples: Redis, Tarantool, Apache Ignite

4. Wide-column databases.

These databases store data as key/value records on HDD or SSD. These solutions are designed to scale well enough to manage petabytes of data across thousands of commodity servers in a distributed system. They represent the SSTable architecture. This architecture was designed for two use cases: fast access by key and fast, highly available writing.

Advantages

  • Fast writing row by row
  • Fast reading by key
  • Good scalability
  • High availability

Disadvantages

  • Key/value format
  • No analytics support

Examples: Cassandra, HBase

5. Columnar databases.

Sometimes we need to access data fast not with particular keys, but with particular columns. In this case, we better get rid of inserting row by row and move to batch writing. Batch inserts allow columnar databases to prepare the data for rapid read by columns.

Advantages

  • Fast reading by column
  • Good analytics support
  • Good scalability

Disadvantages

  • Only good for batch inserts

Examples: Vertica, Clickhouse

6. Search engine

If we want to access the data with filter by any value and even with any word in column, we should remember search engines. These databases perform indexing of every word in columns and allow full-text search. They are perfect for storing and analyzing logs or large text values.

Advantages

  • Quick access by any word
  • Good scalability

Disadvantages

  • Only good for batch inserts
  • Poor analytics support

Examples: ElasticSearch, Apache Solr

7. Graph databases

For some use cases exist graph data structures. We can find their realization in graph databases. If your tasks require working with graphs, there are special databases designed to satisfy your needs.

Advantages

  • Graph data structure
  • Manageable relations between entities
  • Flexible structures

Disadvantages

  • Special query language
  • Difficult to scale

Examples: Neo4j

Conclusion

Almost every task can be done with almost any type of database. The question is how expensive and optimized it would be. Choosing the tool you are used to can reduce your time to market, but it also can cost you an enormous amount of money to maintain and expand your hardware, which may be used inefficiently. Always try to use a database in the way it was meant to use. Perhaps, a solution that suits your needs already exists.