While running a self managed elasticsearch cluster like any other database, it's important to make provisions for data backups. Data backups on Elasticsearch can't be done by simply copying elasticsearch data files from one disk to another, this tutorial guides you through making the best use of the Elasticsearch snapshot module for creating cluster snapshots and leverages the Azure blob storage for securely storing your backed up data. Also besides backing up data, the snapshot api also comes in handy for migrating data from one cluster to another. As mentioned earlier, this tutorial uses the as a backup store, other storage services such as Amazons s3 can be used as well, but to follow this tutorial comprehensively you'd need to have an azure subscription, you can sign up for a free trial . You would also need to have access to the Elasticsearch cluster node terminal. Azure blob storage here Moving on... STEP 1 Setting Up An Azure Blob Storage Account follow the steps below to create an azure blob storage account, if you don't have one already for this purpose. 1. On the Azure portal, click on the storage accounts link on the sidebar, or you can use the search resource option to search for "storage account" if this link isn't present on your side bar 2. Click on add new at top left corner of the storage account panel 3. Fill in the required information on the first panel, your can accept the defaults for the next sections, and then create your storage account. 4. Access your just created storage account and create a new container 5. on your storage account page, on the side bar click on "access keys", copy the account name and they key1. That's it for the storage account setup, we then proceed to the next step STEP 2 Installing The Azure Repository Plugin For Elasticsearch To start taking snapshots we need to first register a snapshot repository within our elasticsearch cluster, this repository defines where Elasticsearch should store snapshots taken, learn more about it . Remember these repositories could be an HDFS or a cloud storage service and in this case we are using the Azure blob storage service. here To register the repository for azure, ssh into your ES cluster node and enter the following commands sudo bin/elasticsearch-plugin install repository-azure If this doesn't work, then try the command below. The bin folder may differ depending on how your elasticsearch was setup sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install repository-azure Afterwards restart your elasticsearch cluster sudo systemctl restart elasticsearch Next, hit the following endpoint to confirm the plugin is installed, , you should get the following output, an object containing details of installed plugins on that cluster. http://eshost:port/_nodes?filter_path=nodes.*.plugins after confirming the plugin is installed, the next step is to setup the plugin to work with our storage account, we'd be configuring the plugin using the storage account name and key bin/elasticsearch-keystore add azure.client.default.account enter your azure storage account name here, next bin/elasticsearch-keystore add azure.client.default.key enter your account key here, if you have issues running these command you should confirm the location of your elasticsearch bin folder, you could try this path , it depends on your installation configuration. After setting the values in the keystore, restart your elasticsearch cluster. /usr/share/elasticsearch/bin/elasticsearch-keystore sudo systemctl restart elasticsearch Next, we set up a snapshot repository, you can do this by sending a post request to the following endpoint and pass the these json payload http://eshost:port/_snapshot/name-of-your-repo { : , : { : , : , : , : } } "type" "azure" "settings" "container" "backup-container" "base_path" "backups" "chunk_size" "32MB" "compress" true you can leave the type as is, in the settings, the container name should be the name of the container created in your storage account, the defines a folder where snapshot data should be stored, this is useful if you are taking snapshot of different indices or even data from different clusters and storing them in one container. base_path The defines how small big files can be broken down to prior to being transferred . you can get more details about the settings by following this . Below is a screenshot of my settings chunk_size link You should get an "acknowledged: true" as a response. You can view all your registered repositories by making a get request to the following endpoint http://eshost:port/_snapshot/ Step 3 Taking Actual Snapshots Now let's move on to taking actual snapshots, for this I've created a sample index "sample_records", we could back this up or better still let's back up all indices in the cluster along with the clusters settings. To do this, make a post request to the following endpoint http://eshost:port/_snapshot/azureblob_backup/%3Csnapshot-%7Bnow%2Fd%7D%3E with the following payload { : , : , : , : } "indices" "index_1,index_2" "ignore_unavailable" true "include_global_state" true "partial" true By default, when option is not set and an index is missing the snapshot request will fail. By setting to false it’s possible to prevent the cluster global state to be stored as part of the snapshot. By default, the entire snapshot will fail if one or more indices participating in the snapshot don’t have all primary shards available. ignore_unavailable include_global_state This behaviour can be changed by setting to true. This is from elasticsearch's official docs . Below is a snapshot of my backup request, partial here Here, notice I didn't didn't add the indices parameter, when the indices parameter isn't included, all indices present in the cluster is going to be included in the snapshot. Take note of the snapshot name / this is the url encoded version of this this is translated to the current date the snapshot was taken, i.e also note that it isn't a prerequisite to name your backups this way, you can give it any name you want to, this just makes sense in case you are doing daily or weekly backups via cron for example, to be able to reference snapshots easily. %3Csnapshot-%7Bnow%2Fd%7D%3E, <snapshot-{now/d}>, /snapshot-2020.04.09, Next, we are going to monitor the status of an ongoing snapshot, to do this, send a get request to the following endpoint, , http://eshost:port/_snapshot/azureblob_backup/<snapshot-name> note that using the url encoded format won't work if you used it to save your snapshot, instead use the literal string, e.g now http://eshost:port/_snapshot/azureblob_backup/snapshot-2020.04.09, you should get the following response. { : [ { : , : , : , : , : [ ], : , : , : , : , : , : , : , : [], : { : , : , : } } ] } "snapshots" "snapshot" "snapshot-2020.04.09" "uuid" "OjLZEfXDS-mKVqsSi7VteQ" "version_id" 6080399 "version" "6.8.3" "indices" "sample_records" "include_global_state" true "state" "SUCCESS" "start_time" "2020-04-08T09:53:47.926Z" "start_time_in_millis" 1586339627926 "end_time" "2020-04-08T10:03:15.361Z" "end_time_in_millis" 1586340195361 "duration_in_millis" 567435 "failures" "shards" "total" 15 "failed" 0 "successful" 15 Take note of the " array, this shows the names of backed up indices, the state shows the current status of the snapshot it can be either , IN_PROGRESS, FAILED, SUCCESS or PARTIAL, if the snapshot is in a PARTIAL state, it means some indices could not be backed up, the names of these indices are saved in the array. indices" "failures" Now, lets check our azure storage to see if the snapshot was saved in our container. There we go! a snapshot of our entire cluster! Restoring A Snapshot This assume you a new empty cluster you wish to copy your snapshot data to, first on your new cluster you have to configure the Azure repository plugin by following the same steps above, use the same Storage account as that which was used to take the snapshot, keys and all. Ensure also that the matches that which was used in creating the snapshot. Next, send a post request to the following endpoint to restore your snapshot to the new cluster, base_path http://eshost:port/_snapshot/<repo-name>/<snapshot-name>/_restore Note that there compatibility requirements for backups, below are the compatibility ranges If you plan to export data from one ES cluster to another, you need to be aware that not all versions may be compatible with your exported data. A snapshot of an index created in 6.x can be restored to 7.x. A snapshot of an index created in 5.x can be restored to 6.x. A snapshot of an index created in 2.x can be restored to 5.x. A snapshot of an index created in 1.x can be restored to 2.x. Conversely, snapshots of indices created in 1.x be restored to 5.x or 6.x, snapshots of indices created in 2.x be restored to 6.x or 7.x, and snapshots of indices created in 5.x be restored to 7.x or 8.x. This is from the official elasticsearch docs. cannot cannot cannot Bonus! Automating Things The most popular use case of the Elasticsearch snapshot module is for making backups of your cluster and more often than not backups are automated. So I've written a simple node script that helps take a snapshot of your cluster and sends a mail notification on the backup status, this script can be triggered by a cron job set to run daily or however frequently you'd like. axios = ( ) nodemailer = ( ); transporter = nodemailer.createTransport({ : , : { : , : } }); SNAPSHOT_URL = CLUSTER_NAME = ; dateObj = (); month = dateObj.getUTCMonth() + ; day = dateObj.getUTCDate(); year = dateObj.getUTCFullYear(); hour = dateObj.getUTCHours(); minute = dateObj.getUTCMinutes(); seconds = dateObj.getUTCSeconds(); backuptime = ; axios.post( ,{ : , : }).then( { .log(response.data.accepted) (response.data.accepted === ){ .log( ) checker(); } { .log( ) notify( ) } },(error)=>{ .log( , error) notify( ) }); checker = { intervalId = setInterval( { .log( ) axios.get( ) .then( { status = response.data.snapshots[ ].state; .log(status); (status === ){ clearInterval(intervalId); notify( ) } (status === || status === ){ clearInterval(intervalId); notify( ) } (status === ){ clearInterval(intervalId); notify( ) } { } }) .catch( { .log( , error) clearInterval(intervalId); }) }, ); } notify = { mailOptions = { : , : , : , : message }; transporter.sendMail(mailOptions, (error, info)=>{ (error) { .log(error); } { .log( + info.response); } }); } milisecConvert = { hours,minutes; hours = .floor(milisec/ / / ); minutes = .floor((milisec/ / / - hours)* ); minutes > ? munites : ; } const require "axios" const require 'nodemailer' /** * Configure email */ let service 'emailservice' auth user 'somemail@domain.com' pass '*****************' const 'http://localhost:9200/_snapshot/azureblob_backup/' const 'tutorial_cluster' let new Date let 1 //months from 1-12 let let let let let let ` - - - - - ` ${year} ${month} ${day} ${hour} ${minute} ${seconds} ` snapshot- ` ${SNAPSHOT_URL} ${backuptime} "ignore_unavailable" true "include_global_state" true ( )=> response console if true console "start checking for status" else console "send failure notification" `Could not start backup for ` ${CLUSTER_NAME} console "Backup Not Started Error ===>" `Could not start backup for ` ${CLUSTER_NAME} let ( ) function let => () console "checking....." ` snapshot- ` ${SNAPSHOT_URL} ${backuptime} ( ) function response // handle success let 0 console if 'SUCCESS' //send a success mail & clear interval ` Has Been Backed Up Successfully \n completed in minute(s) \n please check http:// snapshot- for details` ${CLUSTER_NAME} ${milisecConvert(response.data.snapshots[ ].duration_in_millis)} 0 ${SNAPSHOT_URL} ${backuptime} else if 'ABORTED' 'FAILED' //send failure message & clear interval ` Backup Failed please check http:// snapshot- for details ` ${CLUSTER_NAME} ${SNAPSHOT_URL} ${backuptime} else if 'PARTIAL' //send failure message & clear interval ` Backed up with a few issues please check http:// snapshot- for details ` ${CLUSTER_NAME} ${SNAPSHOT_URL} ${backuptime} else //continue ( ) function error console "request status error >>>>>" 5000 let ( )=> message //set mail options let from 'somemaili@domain.com' to 'tech@yourcompany.com' subject ` Elasticsearch Backup Notification` ${CLUSTER_NAME} text if console else console 'Email sent: ' let ( )=> milisec let Math 1000 60 60 Math 1000 60 60 60 return 1 'less than 1' You need to install the dependencies, and for this script to work and also you should have had your snapshot repository set up already. You can and should set up a cron to run this script at specified intervals. axios nodemailer NOTES Elasticsearch snapshots are incremental, that is if a record already exists in the snapshot it wouldn't be part of the next snapshot. This makes the snapshot process run a lot faster especially after the first run. If you do setup a cron for your snapshots, you should take into consideration of how often your data changes or increases while setting intervals. Did this help? let me know. O dabọ ✌