Getting started with Distributed SQL

Written by alejandroduarte | Published 2023/01/09
Tech Story Tags: mariadb | docker | xpand | sql | databases | high-availability | scalability | distributed-sql

TLDRDistributed databases allow you to scale reads and writes by adding nodes to the cluster while keeping all the benefits of relational databases. A key feature of a distributed SQL database is that it makes a cluster look like a single logical database. MariaDB Xpand rebalances the data when a node goes down, a new node is added, or a hotspot is detected.via the TL;DR App

A distributed SQL database allows you to scale reads and writes by adding nodes to the cluster while keeping all the benefits of relational databases. A key feature of a distributed SQL database is that it makes a cluster look like a single logical database. Applications don’t need to know the number of nodes or implement sharding logic.

MariaDB Xpand

MariaDB Xpand is a distributed SQL database compatible with MariaDB in which SQL tables are automatically partitioned based on the hash of a subset of columns (the primary key by default). Each partition forms a slice of data that is placed in a node. When a write happens, the hash is calculated, and the operation is forwarded to the correct node. MariaDB Xpand rebalances the data when a node goes down, a new node is added, or a hotspot is detected. All this happens automatically and transparently to the application.

Multiple writes can be sent in parallel to multiple nodes. Reads are forwarded to the nodes that contain the slices with the data and can run in parallel minimizing lag. This architecture is able to handle large amounts of data and maintain high levels of performance that surpass non-distributed databases. The query is sent to the data instead of data to the query.

The following diagram is an example of a MariaDB Xpand database cluster with three nodes. Each slice is shown in a different color. The diagram also shows how replicas are automatically maintained for high availability:

During a scale-out process, a new node is added (xpand_4 in the following figure). MariaDB automatically rebalances the data to increase the overall capacity of the system:

Distributed SQL is the best alternative to manual database sharding. It frees DevOps engineers and DBAs from tedious management operations such as data rebalancing. It also frees developers from implementing and maintaining custom sharding logic that increases the complexity of queries (especially with cross-node joins). A developer can simply write a SQL query as if it was a single-node database.

Getting started with Docker

The best way to get started with Distributed SQL is to try it out. Docker makes this process extremely simple. To spin up a single-node MariaDB Xpand instance suitable for testing and experimentation using Docker, run the following:

docker run --rm -p3306:3306 --ulimit memlock=-1 --name xpand mariadb/xpand-single

You can connect to the database as if it was a single-node MariaDB server. See this website to learn how to connect from multiple programming languages (Java, JavaScript/NodeJS, Python, C++). You can also connect from SQL clients such as DBeaver, DataGrip, or any IDE extensions for SQL databases.

Once you have tried your application with a single-node Xpand instance, the best way to test its scalability and other advanced features (like automatic scaling and columnar indexing for analytics) is to head through SkySQL, a cloud service that offers a free tier (no credit card required).

There also are plenty of online resources to learn more about Distributed SQL and MariaDB Xpand:


Written by alejandroduarte | Software Engineer - Published Author - Award winner - Developer Relations Engineer at MariaDB plc
Published by HackerNoon on 2023/01/09