This article will provide a straightforward guide on creating a clone of an index in Elasticsearch without the need to set up a work environment, helping you streamline your daily routine. In the development world, the need to fast-replicate and create a clone of a database often arises. This crucial process serves various purposes, such as facilitating Development, enabling testing of new features, and safeguarding data integrity, all while avoiding conflicts that might arise when multiple colleagues are working on the same dataset (Elasticsearch). Recently, my colleague and I had to work on a related feature. We got in each other's way and, at some point, decided to create a copy of the database that would separate our testing. Here’s how we did it: Step 1: Locating the Target Index The initial step in crafting a duplicate of an database involves the identification of the precise index you intend to copy. This index, which we'll refer to as " can typically be found within the confines of your Application's configuration file." "Within this file, you'll discover a list of indexes associated with a meaningful purpose in your Application. It's these indexes that serve as the cornerstone for your data operations. Elasticsearch " current index name, application.yml indexes:
  sims: ${INDEX_SIMS:real_sims}
  accounts: ${INDEX_ACCOUNTS:real_accounts} Step 2: The Birth of the .env File Once the target index has been earmarked, the following stride involves the creation of a dedicated ".env" file. This file becomes instrumental in the subsequent renaming of your indexes, ensuring that your data remains organized and ready for future work. Here, you redefine your index names, breathing new life into them for a fresh start. INDEX_SIMS=new_sims
INDEX_ACCOUNTS=new_accounts Step 3: Harnessing the Elasticsearch Reindex API Now, it's time to delve into the practical side of things. Leveraging the Elasticsearch Reindex API, you embark on the data duplication process. This API, a powerful tool in the Elasticsearch toolkit, allows for a seamless transfer of data from the source index to a new destination index. This section will guide you through the API's inner workings, highlighting the essential parameters to execute this transformative operation. curl -X POST "localhost:9200/_reindex" -H "Content-Type: application/json" -d'
{
  "source": {
    "index": "current_index_name"
  },
  "dest": {
    "index": "new_index_name"
  }
}
' The real example is: POST _reindex
{
  "source": {
    "index": "real_sims"
  },
  "dest": {
    "index": "new_sims"
  }
} Step 4: Grasping the Size Implications A critical aspect to understand during the index duplication is how it affects storage size. It's vital to comprehend that the new index, now living independently, will consume its share of disk space. This effectively doubles storage requirements if the source and destination indexes reside within the same Elasticsearch cluster. It's crucial to remember that the original index remains unaltered unless explicitly deleted. GET /index_name/_stats Here, we see that our index is minimal. Please check all indexes and inform the responsible representative that the database size is being temporarily doubled. Step 5. Verifying the New Databases With the duplication complete, it's time to verify the existence and integrity of your new databases. Elasticsearch offers comprehensive statistics, allowing you to inspect the newly created indexes quickly. Additionally, you can perform checks using unique identifiers to ensure that your data has made the transition seamlessly. GET new_sims GET new_sims/_search
{
    "query": {
     "term" : {
        "id": {
           "value" : "eb199422-3385-4c24-92a8-1b6fba9ef802"
    }
}}} Once your clones are prepared, we can launch the Application with the new environment. Step 6. Integration into Your Workflow Having successfully duplicated the index, you are now poised to integrate it into your Development, testing, or any other workflow. This is achieved by configuring your environment to utilize the new indexes through the ".env" file, providing you with a secure, segregated workspace for experimentation, Development, and testing. I use IntelliJ IDEA for work. Here are examples of setting up new databases. Open the Application’s configuration menu. Enable EnvFile and set up its source. Run the Application with the new configuration. Step 7. Conclusion and Cleanup As you wrap up your tasks, it's essential to emphasize the importance of cleanup. Deleting redundant indexes ensures you do not clutter your storage with unnecessary data. However, always exercise caution to preserve your current, production-ready database in its pristine state. DELETE index_name In our particular situation, it is: DELETE new_sims Before you start to implement, follow me and subscribe. LinkedIn Hackernoon

How to Reindex a Database in Elasticsearch

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Boosting Website Performance: Leveraging Webpack for File Compression and Optimization

10 Steps To Digital Transformation While Simultaneously Cutting Costs

10 Minute Guide to Fixing Damaged SQL Databases - No Recovery Required!

10 Cool CI/CD Tools For Your Project

121 Stories To Learn About Databases

14 Open Source SQL Parsers

Boosting Website Performance: Leveraging Webpack for File Compression and Optimization

10 Steps To Digital Transformation While Simultaneously Cutting Costs

10 Minute Guide to Fixing Damaged SQL Databases - No Recovery Required!

10 Cool CI/CD Tools For Your Project

121 Stories To Learn About Databases

14 Open Source SQL Parsers

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps