It's not uncommon for organizations to rely on multiple types of databases to meet various needs. From MongoDB's JSON-like documents to MariaDB's transactional SQL tables, each database has its own unique strengths. But what happens when you need to merge data from these different databases, say, for reporting purposes? The conventional route involves querying each database separately and merging the results programmatically. However, this method can be cumbersome and inefficient. In this article, we will explore a more streamlined approach: Using an intelligent database proxy, we'll demonstrate how you can perform complex SQL joins between a MariaDB and a MongoDB database, all while keeping your application's architecture clean and efficient.
Suppose you have a Node.js app connected to a MongoDB database for storing book ratings and another app in Java linked to a MariaDB database for managing book details. You're tasked with generating a report combining titles and ratings for each book. Since the MongoDB database lacks information on book titles and the MariaDB database lacks information on ratings, you'll need to merge data from two very different databases. Typically, you'd query each database separately, then use a common identifier like ISBN to match and combine the data. This can be achieved in any programming language that can connect to both databases, such as Java, TypeScript, C#, or Python, as shown in the following figure:
While manually joining data works, databases are designed for such operations, making them more efficient. This approach also turns your SQL application into a complex, harder-to-maintain database polyglot. By using a database proxy like MaxScale, you can handle the data-joining task at the database level using SQL, keeping your application single-faceted. Though this adds another component to your infrastructure, you gain benefits like automatic failover, data masking, and security features. MaxScale is a versatile proxy compatible with both SQL and NoSQL databases and even Kafka. This allows you to integrate your NoSQL app with a relational database, benefiting from ACID compliance.
To follow the rest of this article, make sure you have Docker, the new MariaDB-shell tool, and the mongosh tool installed on your computer.
To set up the database environment, use the following docker-compose.yml file:
version: "3.9"
services:
mariadb:
image: alejandrodu/mariadb
environment:
- MARIADB_CREATE_DATABASE=demo
- MARIADB_CREATE_USER=user:Password123!
- MARIADB_CREATE_MAXSCALE_USER=maxscale_user:MaxScalePassword123!
maxscale:
image: alejandrodu/mariadb-maxscale
command: --admin_host 0.0.0.0 --admin_secure_gui false
ports:
- "3306:4000"
- "27017:27017"
- "8989:8989"
environment:
- MAXSCALE_USER=maxscale_user:MaxScalePassword123!
- MARIADB_HOST_1=mariadb 3306
- MAXSCALE_CREATE_NOSQL_LISTENER=user:Password123!
The previous Docker Compose file sets up two services for local deployment: a MariaDB database server and a MaxScale database proxy. The MariaDB server is configured with a demo
database and a user named user
with the password Password123!
, granted relevant privileges. Additionally, a maxscale_user
with the password MaxScalePassword123!
is created for MaxScale's use.
The proxy’s web UI is configured to disable HTTPS—something not advisable in a production environment. The file also specifies various ports to expose and the details of the database user as well as the MariaDB server location. Typically, you would use an IP address for this location, but here, the container name suffices when using Docker Compose. Finally, a NoSQL listener is established for MongoDB client connections using the default MongoDB port 27017.
To launch the services via the command line, navigate to the folder where your Docker Compose file is stored and execute the following command:
docker compose up -d
Note:* While these Docker images are for testing and not for production, their source code is available on GitHub. For other purposes, check the MariaDB's official images on Docker Hub.
Connect to the MariaDB (relational) database by running the following command in your terminal:
mariadb-shell --dsn mariadb://user:'Password123!'@127.0.0.1
Switch to the demo database:
use demo;
Create the following SQL table:
CREATE TABLE books(
isbn VARCHAR(20) PRIMARY KEY,
title VARCHAR(256),
year INT
);
Insert a few rows into the books
table (sorry, I couldn’t resist the cliché of using my own books):
INSERT INTO books(title, isbn, year)
VALUES
("Vaadin 7 UI Design By Example", "978-1-78216-226-1", 2013),
("Data-Centric Applications with Vaadin 8", "978-1-78328-884-7", 2018),
("Practical Vaadin", "978-1-4842-7178-0", 2021);
Confirm that the data is stored correctly:
SELECT * FROM books;
Even without installing a MongoDB server, we can use a MongoDB client to create collections and documents in MariaDB. The difference is that the data resides in a robust, fully ACID-compliant, and scalable relational database. Let's give it a try!
In the terminal, use the MongoDB shell tool to connect to what appears to be a MongoDB server but actually is a MariaDB database! Just execute the following command:
mongosh
When parameters are omitted, the tool connects to a MongoDB server (which is MariaDB in our case) on your local machine (127.0.0.1) via the default port (20017). If all goes as planned, executing the following command should show the demo
database in the list:
show databases
Switch to that database:
use demo
We've connected to a relational database using a non-relational client! Now, let's create the ratings
collection and populate it with some example data:
db.ratings.insertMany([
{
"isbn": "978-1-78216-226-1",
"starts": 5,
"comment": "A good resource for beginners who want to learn Vaadin"
},
{
"isbn": "978-1-78328-884-7",
"starts": 4,
"comment": "Explains Vaadin in the context of other Java technologies"
},
{
"isbn": "978-1-4842-7178-0",
"starts": 5,
"comment": "The best resource to learn web development with Java and Vaadin"
}
])
Verify that the ratings have been saved in the database:
db.ratings.find()
Now, we have a unified database that appears as both a NoSQL (MongoDB) and a relational (MariaDB) database externally. We can interact with this single database using both MongoDB and SQL clients. All data resides in MariaDB, allowing us to use SQL to merge data created by MongoDB and MariaDB clients or apps. Let's delve into how MaxScale utilizes MariaDB to house MongoDB data, including collections and documents.
Use the mariadb-shell
client to connect to the database and display the tables within the demo
schema (check the documentation for more information on this new CLI client):
show tables in demo;
You should see tables for both books and ratings data. The ratings
table, originally a MongoDB collection, was created via MaxScale's translation of MongoDB commands into a SQL table format. Let's examine this table's structure:
describe demo.ratings;
The ratings
table has two columns:
id
: the object identifierdoc
: the JSON-formatted document
All rating data is stored in the doc
column as JSON. Run the following query:
SELECT doc FROM demo.ratings \G
Revisiting our main objective of pairing book titles with ratings, let's hypothetically consider that the ratings
table has conventional columns for stars
and comment
. If this were the case, merging it with the books
table would be a simple task.
/* this doesn't work */
SELECT b.title, r.stars, r.comment
FROM ratings r
JOIN books b USING(isbn)
To include the doc
column from the ratings
table in the query, we'll need to transform it into a relational expression that can act like a new table in the SQL join. For example:
/* this still doesn't work */
SELECT b.title, r.stars, r.comment
FROM ratings rt
JOIN ...something to convert rt.doc to a table... AS r
JOIN books b USING(isbn)
The solution to this is the JSON_TABLE
function. MariaDB offers a wide range of JSON functions for handling JSON data. We'll employ JSON_TABLE
to transform the doc
column into a relational structure suitable for SQL joins. Here's the general syntax:
JSON_TABLE(json_document, context_path COLUMNS (
column_definition_1,
column_definition_2,
...
)
) [AS] the_new_relational_table
Where:
json_document
is an expression with the JSON textContext_path
` is a JSON Path expression that defines the nodes to be used as the source of the rows
The syntax for column definitions is as follows:
new_column_name sql_type PATH path_in_the_json_doc [on_empty] [on_error]
Using all this, our SQL query would be structured as follows:
SELECT b.title, r.stars, r.comment
FROM ratings rt
JOIN JSON_TABLE(rt.doc, '$' COLUMNS(
isbn VARCHAR(20) PATH '$.isbn',
stars INT PATH '$.starts',
comment TEXT PATH '$.comment'
)
) AS r
JOIN books b USING(isbn);
You could use the ISBN as the MongoDB ObjectID, making it the id
column in the ratings
table. I'll leave that for you to try (hint: use _id
instead of isbn
when inserting data through the MongoDB client or app).
The common belief that relational databases can't scale horizontally—by adding more nodes—as NoSQL databases can is misleading. In fact, relational databases can scale while still maintaining their ACID compliance. Take MariaDB as an example: it offers a range of specialized storage engines suited to various tasks. Using the Spider storage engine, you can implement data sharding to effectively scale your MariaDB database. Additionally, MariaDB allows you to select different storage engines for different tables, depending on the workload. What's more, it even supports cross-engine joins within a single SQL query.
An even more powerful option for scaling is to use MariaDB Xpand for [distributed SQL]. This approach provides a unified, logical view of a relational database to applications, made possible through “automatic sharding”. Utilizing a shared-nothing architecture, MariaDB Xpand is capable of scaling both read and write operations almost linearly.
We've accomplished our goal! Now, our systems offer a scalable, ACID-compliant, unified view of your data, regardless of its SQL or NoSQL origin. This reduces the need to transition apps from NoSQL to SQL or diversify the databases they connect to. For more on MaxScale's features, check out this video or consult the documentation.