MySQL Master-Slave Replication using Docker

Written by vladys_lav | Published 2020/07/07
Tech Story Tags: mysql | replication | docker | database | orchestration | backend | docker-compose | tutorial | hackernoon-es

TLDR This article is based on the repository docker-mysql-master-slave on the GitHub I have created a few years ago. It contains configurations and instructions with the MySQL replication example based on Docker. This is good to start understanding replication and test replication-unsafe statements and not intended for production use. It might be useful for DevOps and Software Engineers who want to create working environment more closely to production environment and for those who want better understand what MySQL replication really looks like. The problem here is the auto incremented columns sync.via the TL;DR App

In this article I want to share my experience of how to setup MySQL replication locally with using Docker.
It is based on this repository docker-mysql-master-slave on the GitHub I have created a few years ago. It contains configurations and instructions with the MySQL replication example based on Docker. This is good to start understanding replication and test replication-unsafe statements and not intended for production use. After getting dozens of starts I’ve considered to add some more explanations about how it works.
In this article I want to give you more detailed explanations about overall process. It might be useful for DevOps and Software Engineers who want to create working environment more closely to production environment and for those who want better understand what MySQL replication really looks like.

Prerequisites.

  1. You have Docker v.18 or higher installed.
  2. Installed docker-compose v1.23 or higher.
  3. Very basic MySQL knowledge.

Some theory.

MySQL replication is a special setup which involves two or more MySQL servers where one database server (known as master or source) is copied to another (known as slave or replica). Process of replica synchronisation is done through coping and performing SQL statements from source’s binary log. MySQL configuration allows to select the whole source database or only particular tables to be copied to the replica.
By default synchronisation type for MySQL replication is asynchronous (one-way), which means “replica” does not notify it’s “source” about results of coping and processing events. Additional types of synchronisation (semi-synchronous, synchronous) may be available via plugins or in special setups (like NDB Cluster).
With MySQL replication you can make some specific configuration types: chained replication, circular (also known as master-master or ring) and combinations of these. The limitation is that replica can have only one source server.
Chained replication means there is a chain of database servers.
Example: Source 1 — > Replica 1 — > Replica 2.
Replica 1 is source for Replica 2 and replica for Source 1. It’s useful for a case when Replica 1 contains a “merged” database, which consists of the “source” tables and its own “added” tables.
Circular replication supposes to have master databases in the circle, that serves also as replicas at the same time. Example: Master← → Master. The problem here is the auto incremented columns sync. Solution could be configuring 'auto_increment_increment' and 'auto_increment_offset' server variables to make increments with different steps or in different range according to each master server setup. If you are using InnoDB consider that fact that with ring replication, row will not be always added to the end of the replica index, in such case it may lead to additional insert latency on replica because of clustered index ordering.

Practice.

It’s time to make some practise! I believe that learn-by-doing is the best learning approach!
Create an empty directory and clone repository.
mkdir mysql-master-slave
cd mysql-master-slave
git clone https://github.com/vbabak/docker-mysql-master-slave ./
./build.sh
Build process requires ports 4406 and 5506 are not in use on your system. Otherwise you can update it to any non-used port in 'docker-compose.yml' file and re-run the build script.
If all goes smoothly you will get such messages:
Waiting for mysql_master database connection…
and finally a replica (slave) status report.
The last line says it is waiting for new updates from master.
To test replication is working, run this query on Master:
docker exec -it mysql_master bash
mysql -u root -p'111' mydb
mysql> create table if not exists code(code int);
# Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> insert into code values (100), (200);
# Query OK, 2 rows affected (0.01 sec)
And check Replica:
docker exec -it mysql_slave bash
mysql -u root -p'111' mydb
mysql> select * from code;

Under the hood. Master configuration.

Master server configuration placed into “master/conf/mysql.conf.cnf”. Here is some explaining. First 2 options are used to increase server performance and not related to the replication settings itself.
skip-host-cache
Disable use of the internal host cache for faster name-to-IP resolution.
skip-name-resolve
Disable DNS host name lookups
server-id = 1
For servers that are used in a replication, you must specify a unique server ID. It must be different from every other ID in use by any other source or replica.
log_bin = /var/log/mysql/mysql-bin.log
Enables bin log and sets the base name and path for the binary log files (like log_bin_basename).
binlog_format = ROW
Possible values are ROW (replica replay only actual changes on the row), STATEMENT (replica replay all the queries that changes the data), MIXED (statement-based replication is used unless server decides only row-based replication can give proper result, like replicating result of GUUID() ).
binlog_do_db = mydb
Specify a database, which statements will be written to binary log file.
Environment parameters related to launch MySQL in a docker container are placed into “master/mysql_master.env” file. They are described on the docker hub website for the mysql:5.7 image.
If you’re a Windows user, the build.sh script probably will not work, so you will need to setup master database with creating `mydb_slave_user` user — run 2 sql commands on Master and then setup slave database — run 2 sql commands on Replica , see details below.
Finally, add a replication user on master server. Create a new user for replication with REPLICATION SLAVE permission:
# SETUP MASTER SQL COMMANDS
GRANT REPLICATION SLAVE ON *.* TO "mydb_slave_user"@"%" IDENTIFIED BY "mydb_slave_pwd";
FLUSH PRIVILEGES;

Replica configuration.

Some of the configuration parameters repeat the Master database.
relay-log = /var/log/mysql/mysql-relay-bin.log
Contains database events, read from the source binary log.
Start a Replica.
First, you need to find a master host ip address. You can check “hosts” file on master host
docker exec -it mysql_master cat '/etc/hosts'
The last line will contain ip address, example: 172.19.0.2. This is a MASTER_HOST.
Second, find a log file and position from this command:
docker exec mysql_master sh -c 'export MYSQL_PWD=111; mysql -u root -e "SHOW MASTER STATUS \G"'
“File:” is MASTER_LOG_FILE and “Position:” is MASTER_LOG_POS.
Now, when we have all the variables, we can modify and run the following command to start a replication:
# SETUP REPLICA SQL COMMANDS
CHANGE MASTER TO MASTER_HOST='${IP}', MASTER_USER='mydb_slave_user', MASTER_PASSWORD='mydb_slave_pwd', MASTER_LOG_FILE='${LOG}', MASTER_LOG_POS=$POS;
START SLAVE;

Replication-unsafe statements and non-deterministic queries.

The “safeness” of a statement in MySQL replication refers to whether the statement and its effects can be replicated correctly using statement-based format. Statement is safe if it deterministic. Deterministic means statement always produce the same result. When using statement-based logging, statements flagged as being unsafe generate a warning. Check a list of unsafe statements here. Example: FOUND_ROWS(), RAND(), UUID().
For row-level replication no distinction is made for deterministic and non-deterministic statements. The drawback of row-based replication is that they are expensive on range updates in case when WHERE clause matches a lot of rows.

Final thoughts.

Replication is powerful part of any database and fortunately it is supported by MySQL.
The use range is wide. It can be used for heigh availability and data distributions, data backups, load balancing. And even for optimisation latency based on geolocation.
Thanks for reading!

Written by vladys_lav | Software Engineer
Published by HackerNoon on 2020/07/07