Hey,
today I happened to write a script to solve a specific problem that it looks like a good deal of people face: renaming a given Elasticsearch index. Naturally, there are documented solutions but I didn’t find quickly a script that would get me where I wanted — all the data from an index named a
now being queryable in an index named b
with all the properties set.
Note.: the following code is aimed at Elasticsearch 2.4.6.
Here it comes then.
There are four steps to get towards our goal:
_reindex
action;To create an index using the default parameters (e.g, number of shards and replicas) we can issue a POST
against the Elasticsearch HTTP endpoint specifying the desired index (in this case, acme-production
:
Which, naturally, has no data indexed:
Now we populate it with some data:
Which we can verify by looking again at the /_cat/indices
endpoint:
Because the renaming is nothing more than “create, copy and delete” we need to create a new index with the properties from the old one. To properly achieve that we must then copy the old configuration:
ps.: here I’m making use of jq, the lightweight command-line JSON processor, in order to get the mappings and settings objects from within the bigger object returned by the call to /<index>/_settings,_mappings
. This way we’re able to assign that to a variable and then make use of it later.
Using the old configuration (stored in the index_config
variable) we’re able to create the the index based on it:
ps.: even though there’s an _uuid_
in the _$index_config_
object there, it doesn’t matter - it’ll get replaced by a new _uuid_
in the new index.
_reindex
actionHaving both indices properly configured we’re ready to have the data from the old index in the new one:
By this time you should already have your new index populated. Now it’s a matter of deleting the old index:
If you have no intention of making use of the old index, now it’s time to drop it:
I received some pretty interesting feedback on Reddit that I’d like to share here.
It turns out that sometimes we can avoid reindex
ing by making use of aliases (see Elasticsearch Indices Aliases).
The idea is that when we need to reference what’s covered by an index by using another name we can create some kind of “pointer” to the real index and perform all the normal operations against this pointer (alias). The API that allows doing that allows us to essentially CRUD
(create, remove, update and delete) aliases, making it totally possible for us to perform what we want: achieve a “renaming” of an index, even if only virtually.
Let’s do it then.
First, create an acme-production
index just like before and add some data:
then create an alias
named acme-staging
:
if we check the indices we’ll see that we don’t have any new index though:
But that we do have aliases:
which allows us to perform queries against acme-staging
and retrieve data from acme-production
:
Now, what if we want to disallow requests to the old index? As if we had really renamed it and not duplicated? Then we need to close the old index using the open/close index api:
curl -XPOST http://localhost:9200/acme-production/_close
then we can try to get from acme-production
:
Cool, what we wanted, huh? Now, if we try to get from acme-staging
:
we can’t retrieve either.
It sounds logical to me that we can’t as the alias is just a pointer to the other index (which was closed).
So, to sum up, if you want to have new indices to point to an existing one (as if you were renaming), aliases will save you and you’ll need to perform 0 copying of data.
If you need to have something like “rename” and disallow access to the old index, then alias
won’t help you (will have to use the reindex + delete
strategy.
I never used aliases before and it’s pretty good to know that they exist! It can definitely be very useful some times.
As someone who never really dug deep into how Elasticsearch works, I found very easy the whole concept of reindexing. The official documentation is pretty good and with it, I was able to quickly solve the problem. Kudos Elasticsearch team!
Thanks,
finis
Originally published at ops.tips on November 21, 2017.