Utilizing the Elasticsearch Snapshot Module for Databackups on Azure blob Storage

While running a self managed elasticsearch cluster like any other database, it's important to make provisions for data backups. Data backups on Elasticsearch can't be done by simply copying elasticsearch data files from one disk to another, this tutorial guides you through making the best use of the Elasticsearch snapshot module for creating cluster snapshots and leverages the Azure blob storage for securely storing your backed up data. Also besides backing up data, the snapshot api also comes in handy for migrating data from one cluster to another.

As mentioned earlier, this tutorial uses the Azure blob storage as a backup store, other storage services such as Amazons s3 can be used as well, but to follow this tutorial comprehensively you'd need to have an azure subscription, you can sign up for a free trial here. You would also need to have access to the Elasticsearch cluster node terminal.

Moving on...

STEP 1 Setting Up An Azure Blob Storage Account

follow the steps below to create an azure blob storage account, if you don't have one already for this purpose.

1. On the Azure portal, click on the storage accounts link on the sidebar, or you can use the search resource option to search for "storage account" if this link isn't present on your side bar

2. Click on add new at top left corner of the storage account panel

3. Fill in the required information on the first panel, your can accept the defaults for the next sections, and then create your storage account.

4. Access your just created storage account and create a new container

5. on your storage account page, on the side bar click on "access keys", copy the account name and they key1.

That's it for the storage account setup, we then proceed to the next step

STEP 2 Installing The Azure Repository Plugin For Elasticsearch

To start taking snapshots we need to first register a snapshot repository within our elasticsearch cluster, this repository defines where Elasticsearch should store snapshots taken, learn more about it here. Remember these repositories could be an HDFS or a cloud storage service and in this case we are using the Azure blob storage service.

To register the repository for azure, ssh into your ES cluster node and enter the following commands

sudo bin/elasticsearch-plugin install repository-azure

If this doesn't work, then try the command below. The bin folder may differ depending on how your elasticsearch was setup

sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-azure

Afterwards restart your elasticsearch cluster

sudo systemctl restart elasticsearch

Next, hit the following endpoint to confirm the plugin is installed, http://eshost:port/_nodes?filter_path=nodes.*.plugins, you should get the following output, an object containing details of installed plugins on that cluster.

after confirming the plugin is installed, the next step is to setup the plugin to work with our storage account, we'd be configuring the plugin using the storage account name and key

bin/elasticsearch-keystore add azure.client.default.account

enter your azure storage account name here, next

bin/elasticsearch-keystore add azure.client.default.key

enter your account key here, if you have issues running these command you should confirm the location of your elasticsearch bin folder, you could try this path

/usr/share/elasticsearch/bin/elasticsearch-keystore

, it depends on your installation configuration. After setting the values in the keystore, restart your elasticsearch cluster.

sudo systemctl restart elasticsearch

Next, we set up a snapshot repository, you can do this by sending a post request to the following endpoint

http://eshost:port/_snapshot/name-of-your-repo

and pass the these json payload

{
    "type": "azure",
    "settings": {
        "container": "backup-container",
        "base_path": "backups",
        "chunk_size": "32MB",
        "compress": true
    }
}

you can leave the type as is, in the settings, the container name should be the name of the container created in your storage account, the

base_path

defines a folder where snapshot data should be stored, this is useful if you are taking snapshot of different indices or even data from different clusters and storing them in one container.

The

chunk_size

defines how small big files can be broken down to prior to being transferred . you can get more details about the settings by following this link. Below is a screenshot of my settings

You should get an "acknowledged: true" as a response. You can view all your registered repositories by making a get request to the following endpoint http://eshost:port/_snapshot/

Step 3 Taking Actual Snapshots

Now let's move on to taking actual snapshots, for this I've created a sample index "sample_records", we could back this up or better still let's back up all indices in the cluster along with the clusters settings. To do this, make a post request to the following endpoint

http://eshost:port/_snapshot/azureblob_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E

with the following payload

{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": true,
  "partial" : true
}

By default, when ignore_unavailable option is not set and an index is missing the snapshot request will fail. By setting include_global_state to false it’s possible to prevent the cluster global state to be stored as part of the snapshot. By default, the entire snapshot will fail if one or more indices participating in the snapshot don’t have all primary shards available.

This behaviour can be changed by setting partial to true. This is from elasticsearch's official docs here . Below is a snapshot of my backup request,

Here, notice I didn't didn't add the indices parameter, when the indices parameter isn't included, all indices present in the cluster is going to be included in the snapshot. Take note of the snapshot name /

%3Csnapshot-%7Bnow%2Fd%7D%3E,

this is the url encoded version of this

 <snapshot-{now/d}>,

this is translated to the current date the snapshot was taken, i.e

/snapshot-2020.04.09,

also note that it isn't a prerequisite to name your backups this way, you can give it any name you want to, this just makes sense in case you are doing daily or weekly backups via cron for example, to be able to reference snapshots easily.

Next, we are going to monitor the status of an ongoing snapshot, to do this, send a get request to the following endpoint,

http://eshost:port/_snapshot/azureblob_backup/<snapshot-name>

note that using the url encoded now format won't work if you used it to save your snapshot, instead use the literal string, e.g

http://eshost:port/_snapshot/azureblob_backup/snapshot-2020.04.09,

you should get the following response.

{
    "snapshots": [
        {
            "snapshot": "snapshot-2020.04.09",
            "uuid": "OjLZEfXDS-mKVqsSi7VteQ",
            "version_id": 6080399,
            "version": "6.8.3",
            "indices": [
               "sample_records"
            ],
            "include_global_state": true,
            "state": "SUCCESS",
            "start_time": "2020-04-08T09:53:47.926Z",
            "start_time_in_millis": 1586339627926,
            "end_time": "2020-04-08T10:03:15.361Z",
            "end_time_in_millis": 1586340195361,
            "duration_in_millis": 567435,
            "failures": [],
            "shards": {
                "total": 15,
                "failed": 0,
                "successful": 15
            }
        }
    ]
}

Take note of the "indices" array, this shows the names of backed up indices, the state shows the current status of the snapshot it can be either , IN_PROGRESS, FAILED, SUCCESS or PARTIAL, if the snapshot is in a PARTIAL state, it means some indices could not be backed up, the names of these indices are saved in the "failures" array.

Now, lets check our azure storage to see if the snapshot was saved in our container.

There we go! a snapshot of our entire cluster!

Restoring A Snapshot

This assume you a new empty cluster you wish to copy your snapshot data to, first on your new cluster you have to configure the Azure repository plugin by following the same steps above, use the same Storage account as that which was used to take the snapshot, keys and all. Ensure also that the base_path matches that which was used in creating the snapshot.
Next, send a post request to the following endpoint to restore your snapshot to the new cluster,

http://eshost:port/_snapshot/<repo-name>/<snapshot-name>/_restore

Note that there compatibility requirements for backups, below are the compatibility ranges

If you plan to export data from one ES cluster to another, you need to be aware that not all versions may be compatible with your exported data.

A snapshot of an index created in 6.x can be restored to 7.x.
A snapshot of an index created in 5.x can be restored to 6.x.
A snapshot of an index created in 2.x can be restored to 5.x.
A snapshot of an index created in 1.x can be restored to 2.x.

Conversely, snapshots of indices created in 1.x cannot be restored to 5.x or 6.x, snapshots of indices created in 2.x cannot be restored to 6.x or 7.x, and snapshots of indices created in 5.x cannot be restored to 7.x or 8.x.
This is from the official elasticsearch docs.

Bonus! Automating Things

The most popular use case of the Elasticsearch snapshot module is for making backups of your cluster and more often than not backups are automated. So I've written a simple node script that helps take a snapshot of your cluster and sends a mail notification on the backup status, this script can be triggered by a cron job set to run daily or however frequently you'd like.

const axios = require("axios")
const nodemailer = require('nodemailer');

/**
 * Configure email 
 */
let transporter = nodemailer.createTransport({
  service: 'emailservice',
  auth: {
    user: '[email protected]',
    pass: '*****************'
  }
});

const SNAPSHOT_URL = 'http://localhost:9200/_snapshot/azureblob_backup/'
const CLUSTER_NAME =  'tutorial_cluster';

let dateObj = new Date();
let month = dateObj.getUTCMonth() + 1; //months from 1-12
let day = dateObj.getUTCDate();
let year = dateObj.getUTCFullYear();
let hour = dateObj.getUTCHours();
let minute = dateObj.getUTCMinutes();
let seconds = dateObj.getUTCSeconds();



let backuptime = `${year}-${month}-${day}-${hour}-${minute}-${seconds}`;



axios.post(`${SNAPSHOT_URL}snapshot-${backuptime}`,{
    "ignore_unavailable": true,
    "include_global_state": true
  }).then((response)=>{
            console.log(response.data.accepted)

            if(response.data.accepted === true){
                console.log("start checking for status")
                checker();
            }else{
                console.log("send failure notification")
                notify(`Could not start backup for ${CLUSTER_NAME}`)
            }

        },(error)=>{
            console.log("Backup Not Started Error ===>", error)
            notify(`Could not start backup for ${CLUSTER_NAME}`)

        });



  let checker = function(){

       let intervalId =  setInterval(() =>{
            console.log("checking.....")
           axios.get(`${SNAPSHOT_URL}snapshot-${backuptime}`)
                .then(function (response) {
                    // handle success
                    let status = response.data.snapshots[0].state;

                    console.log(status);

                    if(status === 'SUCCESS'){
                        //send a success mail & clear interval
                        clearInterval(intervalId);
                        notify(` ${CLUSTER_NAME} Has Been Backed Up Successfully \n  completed in ${milisecConvert(response.data.snapshots[0].duration_in_millis)} minute(s)  \n please check  http://${SNAPSHOT_URL}snapshot-${backuptime} for details`)

                    }else if(status === 'ABORTED' || status === 'FAILED' ){
                        //send failure message & clear interval
                        clearInterval(intervalId);
                        notify(` ${CLUSTER_NAME} Backup Failed  please check  http://${SNAPSHOT_URL}snapshot-${backuptime} for details `)
                    }else if(status === 'PARTIAL'){
                        //send failure message & clear interval
                        clearInterval(intervalId);
                        notify(` ${CLUSTER_NAME} Backed up with a few issues please check  http://${SNAPSHOT_URL}snapshot-${backuptime} for details `)
                    }
                    else{  
                        //continue
                    }
                })
                .catch(function (error) {
                    console.log("request status error >>>>>", error)
                    clearInterval(intervalId);
                })
        },5000);

  }



let notify = (message)=>{
        //set mail options
        let mailOptions = {
            from: '[email protected]',
            to: '[email protected]',
            subject: ` ${CLUSTER_NAME} Elasticsearch Backup Notification`,
            text: message
        };
  
       
        transporter.sendMail(mailOptions, (error, info)=>{
            if (error) {
            console.log(error);
            } else {
            console.log('Email sent: ' + info.response);
            }
        });
}


let milisecConvert = (milisec)=>{
        let hours,minutes;
        hours = Math.floor(milisec/1000/60/60);
        minutes = Math.floor((milisec/1000/60/60 - hours)*60);
        return minutes > 1 ? munites : 'less than 1';
}

You need to install the dependencies, axios and nodemailer for this script to work and also you should have had your snapshot repository set up already. You can and should set up a cron to run this script at specified intervals.

NOTES

Elasticsearch snapshots are incremental, that is if a record already exists in the snapshot it wouldn't be part of the next snapshot. This makes the snapshot process run a lot faster especially after the first run.

If you do setup a cron for your snapshots, you should take into consideration of how often your data changes or increases while setting intervals.

Did this help? let me know.

O dabọ ✌