Or how to overcome the 5 minute Lambda execution limit So last December I was rewriting our indexing microservice in a serverless way. Along the way, I had to migrate Elasticsearch from version 2.3 to 6.1. Not that I was eager to rush the upgrade, quite opposite. But one day Elastic Cloud announced version 2.3 is approaching the end-of-life. As you can imagine, jumping over several major version introduces breaking changes, so there was a good reasoning behind rewriting microservice from scratch — this time in a serverless way. Problem, Context, Solution It was quite interesting undertaking to shift from a queue-based (SQS) to an event-driven (SNS) indexing, but I leave it for another post. First of all, when do you reindex data from the database to Elasticsearch? Changing index mapping. You add a new field, change type of an old one. ES Cluster outage. When this happens, no new data is written to the search index In both cases, you need to fetch loads of documents from the database and flush everything to Elasticsearch as fast as possible. Let’s look at the old way of reindexing when you have a stateful long-running microservice. The old way: long-running microservice The problem comes in the “loop” which can take hours to iterate over all documents. And Lamba runs for 5 minutes only. And you don’t have a state. A nice thing would be having SQS support for Lambda. But as for today, . So what if we temporarily keep the state in the DB itself? Not truly a serverless way, but everything is a tradeoff. it’s still on the roadmap By the way, I used to generate the sequence diagram above. Mermaid Recursive Lambdas I ended up creating a collection to keep track of reindexing progress. More like a list of jobs, which include: Query selector: a starting point to open MongoDB cursor Progress: number of successful and failed operations : this one is important ID of last reindexed document The idea is to kick off 1 Lambda to reindex the first batch of documents, let’s say 10 000. It creates a job with id, puts data to ES and calls itself recursively. The next iteration knows query selector by job id and ID of last reindexed document. So it can start from the place of last execution. appends It sounds complex, so let’s revisit the flow. The new way: recursive Lambda function Some tricks which make this possible: Sort your query by : it’s an indexed field and cheap to sort. Date fields might be more suitable depending on a case _id Sorted queries allow predictable iteration: every next execution queries by + . More on pagination in the article selector {_id: {$gte: lastDocumentId}} We’re doing pagination wrong Disable ES index refresh during heavy reindexing: but don’t forget to enable it back (defaults to 1s) PUT /index_name/_settings{ “index” : { “refresh_interval” : “-1” } } Wins So it may look like a lot of a hassle, but this was the only non-obvious part of old indexing microservice to migrate. This gave a chance to rethink implementation in a way to reindex millions of documents in a matter of 15-ish minutes. Since you get a fresh Lambda container every so often, there is a little chance to catch a memory leak, which was an issue before. Not to forget AWS X-Ray which plays nicely with Lambda. So many performance bottlenecks were discovered in calls to Mongo / S3 / ES. And in the end, you gain all the usual perks of serverless, enjoy!