Hey,
today I happened to write a script to solve a specific problem that it looks like a good deal of people face: renaming a given Elasticsearch index. Naturally, there are documented solutions but I didn’t find quickly a script that would get me where I wanted — all the data from an index named a now being queryable in an index named b with all the properties set.
Note.: the following code is aimed at Elasticsearch 2.4.6.
Here it comes then.
Reindexing step by step
There are four steps to get towards our goal:
- Create an Elasticsearch index and populate it with some data;
- Get the configurations of the original index;
- Create the new index with the desired configuration;
- Run _reindexaction;
- Drop the old index.
0. Create an Elasticsearch index and populate it with some data
To create an index using the default parameters (e.g, number of shards and replicas) we can issue a POST against the Elasticsearch HTTP endpoint specifying the desired index (in this case, acme-production:
Which, naturally, has no data indexed:
Now we populate it with some data:
Which we can verify by looking again at the /_cat/indices endpoint:
1. Get the configurations of the original index
Because the renaming is nothing more than “create, copy and delete” we need to create a new index with the properties from the old one. To properly achieve that we must then copy the old configuration:
ps.: here I’m making use of jq, the lightweight command-line JSON processor, in order to get the mappings and settings objects from within the bigger object returned by the call to /<index>/_settings,_mappings. This way we’re able to assign that to a variable and then make use of it later.
2. Create the new index with the desired configuration
Using the old configuration (stored in the index_config variable) we’re able to create the the index based on it:
ps.: even though there’s an _uuid_ in the _$index_config_ object there, it doesn’t matter - it’ll get replaced by a new _uuid_ in the new index.
3. Run _reindex action
Having both indices properly configured we’re ready to have the data from the old index in the new one:
By this time you should already have your new index populated. Now it’s a matter of deleting the old index:
4. Drop the old index
If you have no intention of making use of the old index, now it’s time to drop it:
What about aliases?
I received some pretty interesting feedback on Reddit that I’d like to share here.
It turns out that sometimes we can avoid reindexing by making use of aliases (see Elasticsearch Indices Aliases).
The idea is that when we need to reference what’s covered by an index by using another name we can create some kind of “pointer” to the real index and perform all the normal operations against this pointer (alias). The API that allows doing that allows us to essentially CRUD (create, remove, update and delete) aliases, making it totally possible for us to perform what we want: achieve a “renaming” of an index, even if only virtually.
Let’s do it then.
First, create an acme-production index just like before and add some data:
then create an alias named acme-staging:
if we check the indices we’ll see that we don’t have any new index though:
But that we do have aliases:
which allows us to perform queries against acme-staging and retrieve data from acme-production:
Now, what if we want to disallow requests to the old index? As if we had really renamed it and not duplicated? Then we need to close the old index using the open/close index api:
curl -XPOST http://localhost:9200/acme-production/_close
then we can try to get from acme-production:
Cool, what we wanted, huh? Now, if we try to get from acme-staging:
we can’t retrieve either.
It sounds logical to me that we can’t as the alias is just a pointer to the other index (which was closed).
So, to sum up, if you want to have new indices to point to an existing one (as if you were renaming), aliases will save you and you’ll need to perform 0 copying of data.
If you need to have something like “rename” and disallow access to the old index, then alias won’t help you (will have to use the reindex + delete strategy.
I never used aliases before and it’s pretty good to know that they exist! It can definitely be very useful some times.
Closing thoughts and resources
As someone who never really dug deep into how Elasticsearch works, I found very easy the whole concept of reindexing. The official documentation is pretty good and with it, I was able to quickly solve the problem. Kudos Elasticsearch team!
Thanks,
finis
Originally published at ops.tips on November 21, 2017.
