This article will provide a straightforward guide on creating a clone of an index in Elasticsearch without the need to set up a work environment, helping you streamline your daily routine.
In the development world, the need to fast-replicate and create a clone of a database often arises. This crucial process serves various purposes, such as facilitating Development, enabling testing of new features, and safeguarding data integrity, all while avoiding conflicts that might arise when multiple colleagues are working on the same dataset (Elasticsearch). Recently, my colleague and I had to work on a related feature. We got in each other's way and, at some point, decided to create a copy of the database that would separate our testing.
Here’s how we did it:
The initial step in crafting a duplicate of an Elasticsearch database involves the identification of the precise index you intend to copy. This index, which we'll refer to as "current index name," can typically be found within the confines of your Application's configuration file."application.yml
"Within this file, you'll discover a list of indexes associated with a meaningful purpose in your Application. It's these indexes that serve as the cornerstone for your data operations.
indexes:
sims: ${INDEX_SIMS:real_sims}
accounts: ${INDEX_ACCOUNTS:real_accounts}
Once the target index has been earmarked, the following stride involves the creation of a dedicated ".env" file. This file becomes instrumental in the subsequent renaming of your indexes, ensuring that your data remains organized and ready for future work. Here, you redefine your index names, breathing new life into them for a fresh start.
INDEX_SIMS=new_sims
INDEX_ACCOUNTS=new_accounts
Now, it's time to delve into the practical side of things. Leveraging the Elasticsearch Reindex API, you embark on the data duplication process. This API, a powerful tool in the Elasticsearch toolkit, allows for a seamless transfer of data from the source index to a new destination index. This section will guide you through the API's inner workings, highlighting the essential parameters to execute this transformative operation.
curl -X POST "localhost:9200/_reindex" -H "Content-Type: application/json" -d'
{
"source": {
"index": "current_index_name"
},
"dest": {
"index": "new_index_name"
}
}
'
The real example is:
POST _reindex
{
"source": {
"index": "real_sims"
},
"dest": {
"index": "new_sims"
}
}
A critical aspect to understand during the index duplication is how it affects storage size. It's vital to comprehend that the new index, now living independently, will consume its share of disk space. This effectively doubles storage requirements if the source and destination indexes reside within the same Elasticsearch cluster. It's crucial to remember that the original index remains unaltered unless explicitly deleted.
GET /index_name/_stats
Here, we see that our index is minimal. Please check all indexes and inform the responsible representative that the database size is being temporarily doubled.
With the duplication complete, it's time to verify the existence and integrity of your new databases. Elasticsearch offers comprehensive statistics, allowing you to inspect the newly created indexes quickly. Additionally, you can perform checks using unique identifiers to ensure that your data has made the transition seamlessly.
GET new_sims
GET new_sims/_search
{
"query": {
"term" : {
"id": {
"value" : "eb199422-3385-4c24-92a8-1b6fba9ef802"
}
}}}
Once your clones are prepared, we can launch the Application with the new environment.
Having successfully duplicated the index, you are now poised to integrate it into your Development, testing, or any other workflow. This is achieved by configuring your environment to utilize the new indexes through the ".env" file, providing you with a secure, segregated workspace for experimentation, Development, and testing.
I use IntelliJ IDEA for work. Here are examples of setting up new databases.
Open the Application’s configuration menu.
As you wrap up your tasks, it's essential to emphasize the importance of cleanup. Deleting redundant indexes ensures you do not clutter your storage with unnecessary data.
However, always exercise caution to preserve your current, production-ready database in its pristine state.
DELETE index_name
In our particular situation, it is:
DELETE new_sims