1,306 reads

Using MariaDB as a MongoDB NoSQL Database

by Alejandro DuarteSeptember 4th, 2023

Too Long; Didn't Read

MariaDB understands NoSQL (MongoDB). You can send NoSQL queries to MariaDB and later mix NoSQL data with SQL data in a single SQL query.

featured image - Using MariaDB as a MongoDB NoSQL Database

It's not uncommon for organizations to rely on multiple types of databases to meet various needs. From MongoDB's JSON-like documents to MariaDB's transactional SQL tables, each database has its own unique strengths. But what happens when you need to merge data from these different databases, say, for reporting purposes? The conventional route involves querying each database separately and merging the results programmatically. However, this method can be cumbersome and inefficient. In this article, we will explore a more streamlined approach: Using an intelligent database proxy, we'll demonstrate how you can perform complex SQL joins between a MariaDB and a MongoDB database, all while keeping your application's architecture clean and efficient.

Combining data programmatically

Suppose you have a Node.js app connected to a MongoDB database for storing book ratings and another app in Java linked to a MariaDB database for managing book details. You're tasked with generating a report combining titles and ratings for each book. Since the MongoDB database lacks information on book titles and the MariaDB database lacks information on ratings, you'll need to merge data from two very different databases. Typically, you'd query each database separately, then use a common identifier like ISBN to match and combine the data. This can be achieved in any programming language that can connect to both databases, such as Java, TypeScript, C#, or Python, as shown in the following figure:

While manually joining data works, databases are designed for such operations, making them more efficient. This approach also turns your SQL application into a complex, harder-to-maintain database polyglot. By using a database proxy like MaxScale, you can handle the data-joining task at the database level using SQL, keeping your application single-faceted. Though this adds another component to your infrastructure, you gain benefits like automatic failover, data masking, and security features. MaxScale is a versatile proxy compatible with both SQL and NoSQL databases and even Kafka. This allows you to integrate your NoSQL app with a relational database, benefiting from ACID compliance.

To follow the rest of this article, make sure you have Docker, the new MariaDB-shell tool, and the mongosh tool installed on your computer.

Setting up the environment with Docker Compose

To set up the database environment, use the following docker-compose.yml file:

version: "3.9"
services:
  mariadb:
    image: alejandrodu/mariadb
    environment:
      - MARIADB_CREATE_DATABASE=demo
      - MARIADB_CREATE_USER=user:Password123!
      - MARIADB_CREATE_MAXSCALE_USER=maxscale_user:MaxScalePassword123!

  maxscale:
    image: alejandrodu/mariadb-maxscale
    command: --admin_host 0.0.0.0 --admin_secure_gui false
    ports:
      - "3306:4000"
      - "27017:27017"
      - "8989:8989"
    environment:
      - MAXSCALE_USER=maxscale_user:MaxScalePassword123!
      - MARIADB_HOST_1=mariadb 3306
      - MAXSCALE_CREATE_NOSQL_LISTENER=user:Password123!

The previous Docker Compose file sets up two services for local deployment: a MariaDB database server and a MaxScale database proxy. The MariaDB server is configured with a demo database and a user named user with the password Password123!, granted relevant privileges. Additionally, a maxscale_user with the password MaxScalePassword123! is created for MaxScale's use.

The proxy’s web UI is configured to disable HTTPS—something not advisable in a production environment. The file also specifies various ports to expose and the details of the database user as well as the MariaDB server location. Typically, you would use an IP address for this location, but here, the container name suffices when using Docker Compose. Finally, a NoSQL listener is established for MongoDB client connections using the default MongoDB port 27017.

To launch the services via the command line, navigate to the folder where your Docker Compose file is stored and execute the following command:

docker compose up -d

Note:* While these Docker images are for testing and not for production, their source code is available on GitHub. For other purposes, check the MariaDB's official images on Docker Hub.

Configuring the SQL database

Connect to the MariaDB (relational) database by running the following command in your terminal:

mariadb-shell --dsn mariadb://user:'Password123!'@127.0.0.1

Switch to the demo database:

use demo;

Create the following SQL table:

CREATE TABLE books(
  isbn VARCHAR(20) PRIMARY KEY,
  title VARCHAR(256),
  year INT
);

Insert a few rows into the books table (sorry, I couldn’t resist the cliché of using my own books):

INSERT INTO books(title, isbn, year)
VALUES
  ("Vaadin 7 UI Design By Example", "978-1-78216-226-1", 2013),
  ("Data-Centric Applications with Vaadin 8", "978-1-78328-884-7", 2018),
  ("Practical Vaadin", "978-1-4842-7178-0", 2021);

Confirm that the data is stored correctly:

SELECT * FROM books;

Creating a NoSQL collection

Even without installing a MongoDB server, we can use a MongoDB client to create collections and documents in MariaDB. The difference is that the data resides in a robust, fully ACID-compliant, and scalable relational database. Let's give it a try!

In the terminal, use the MongoDB shell tool to connect to what appears to be a MongoDB server but actually is a MariaDB database! Just execute the following command:

mongosh

When parameters are omitted, the tool connects to a MongoDB server (which is MariaDB in our case) on your local machine (127.0.0.1) via the default port (20017). If all goes as planned, executing the following command should show the demo database in the list:

show databases

Switch to that database:

use demo

We've connected to a relational database using a non-relational client! Now, let's create the ratings collection and populate it with some example data:

db.ratings.insertMany([
	{
		"isbn": "978-1-78216-226-1",
		"starts": 5,
		"comment": "A good resource for beginners who want to learn Vaadin"
	},
	{
		"isbn": "978-1-78328-884-7",
		"starts": 4,
		"comment": "Explains Vaadin in the context of other Java technologies"
	},
	{
		"isbn": "978-1-4842-7178-0",
		"starts": 5,
		"comment": "The best resource to learn web development with Java and Vaadin"
	}
])

Verify that the ratings have been saved in the database:

db.ratings.find()

Understanding NoSQL collections in MariaDB

Now, we have a unified database that appears as both a NoSQL (MongoDB) and a relational (MariaDB) database externally. We can interact with this single database using both MongoDB and SQL clients. All data resides in MariaDB, allowing us to use SQL to merge data created by MongoDB and MariaDB clients or apps. Let's delve into how MaxScale utilizes MariaDB to house MongoDB data, including collections and documents.

Use the mariadb-shell client to connect to the database and display the tables within the demo schema (check the documentation for more information on this new CLI client):

show tables in demo;

You should see tables for both books and ratings data. The ratings table, originally a MongoDB collection, was created via MaxScale's translation of MongoDB commands into a SQL table format. Let's examine this table's structure:

describe demo.ratings;

The ratings table has two columns:

id: the object identifier
doc: the JSON-formatted document

All rating data is stored in the doc column as JSON. Run the following query:

SELECT doc FROM demo.ratings \G

Mixing SQL and NoSQL data in a single query

Revisiting our main objective of pairing book titles with ratings, let's hypothetically consider that the ratings table has conventional columns for stars and comment. If this were the case, merging it with the books table would be a simple task.

/* this doesn't work */
SELECT b.title, r.stars, r.comment
FROM ratings r
JOIN books b USING(isbn)

To include the doc column from the ratings table in the query, we'll need to transform it into a relational expression that can act like a new table in the SQL join. For example:

/* this still doesn't work */
SELECT b.title, r.stars, r.comment
FROM ratings rt
JOIN ...something to convert rt.doc to a table... AS r
JOIN books b USING(isbn)

The solution to this is the JSON_TABLE function. MariaDB offers a wide range of JSON functions for handling JSON data. We'll employ JSON_TABLE to transform the doc column into a relational structure suitable for SQL joins. Here's the general syntax:

JSON_TABLE(json_document, context_path COLUMNS (
    column_definition_1,
    column_definition_2,
    ...
  )
) [AS] the_new_relational_table

Where:

json_document is an expression with the JSON text
Context_path` is a JSON Path expression that defines the nodes to be used as the source of the rows

The syntax for column definitions is as follows:

new_column_name sql_type PATH path_in_the_json_doc [on_empty] [on_error]

Using all this, our SQL query would be structured as follows:

SELECT b.title, r.stars, r.comment
FROM ratings rt
JOIN JSON_TABLE(rt.doc, '$' COLUMNS(
    isbn VARCHAR(20) PATH '$.isbn',
    stars INT PATH '$.starts',
    comment TEXT PATH '$.comment'
  )
) AS r
JOIN books b USING(isbn);

You could use the ISBN as the MongoDB ObjectID, making it the id column in the ratings table. I'll leave that for you to try (hint: use _id instead of isbn when inserting data through the MongoDB client or app).

Is this scalable?

The common belief that relational databases can't scale horizontally—by adding more nodes—as NoSQL databases can is misleading. In fact, relational databases can scale while still maintaining their ACID compliance. Take MariaDB as an example: it offers a range of specialized storage engines suited to various tasks. Using the Spider storage engine, you can implement data sharding to effectively scale your MariaDB database. Additionally, MariaDB allows you to select different storage engines for different tables, depending on the workload. What's more, it even supports cross-engine joins within a single SQL query.

An even more powerful option for scaling is to use MariaDB Xpand for [distributed SQL]. This approach provides a unified, logical view of a relational database to applications, made possible through “automatic sharding”. Utilizing a shared-nothing architecture, MariaDB Xpand is capable of scaling both read and write operations almost linearly.

Conclusion

We've accomplished our goal! Now, our systems offer a scalable, ACID-compliant, unified view of your data, regardless of its SQL or NoSQL origin. This reduces the need to transition apps from NoSQL to SQL or diversify the databases they connect to. For more on MaxScale's features, check out this video or consult the documentation.