Zero-Downtime Elasticsearch Deployments with Hibernate Search: A Rolling Strategy That Works

We had this problem where we needed to update our Elasticsearch mappings without breaking search during deployments. Our setup is pretty standard: Spring Boot app with Hibernate Search indexing product data to Elasticsearch, running on Kubernetes with rolling deployments. Works great until you need to change your search schema.

The main issue is that during rolling deployments, you've got old and new pods running simultaneously, and they expect different index structures and mappings. Updating schemas require a downtime while the new index is built and documents are indexed.

The Problem We Faced

Our specific pain points were:

Adding new fields without breaking existing functionality
Changing analysers or field mappings without having a search outage
Keeping search functional while pods cycle through versions
Managing the coexistence period where half your pods run the old version and half run the new version

That last one really got us. Kubernetes rolling deployments are great, but they assume your app versions can coexist peacefully. When your database schema changes, that assumption breaks down pretty quickly. We didn't actually solve this problem, we ensured that mixed versions don't break anything during the transition phase.

Why We Avoided Complex Solutions

When we started planning this, we went down some rabbit holes. Spent weeks looking at complex approaches that seemed good on paper.

Alias-based index management with background reindexing sounded clever. I was really into this idea of all these fancy aliases switching around in the background. But when I actually thought about debugging it during an incident, it seemed too complicated.

Blue-green deployments would definitely work. But running two complete Elasticsearch clusters just for search deployments seemed like overkill for our setup.

External orchestration tools - We looked at a few. But we already had enough moving pieces. Adding another service that could fail during deployments felt risky and required more maintenance.

At some point we realised we were making this way too complicated. That's when we stepped back and thought about it differently.

How It Actually Works

So here's what we ended up doing. Each schema version gets its own index, pretty simple when you think about it. When you deploy v2 of your app, it creates a products_v2 index while the v1 pods keep tagging along with products_v1.

During the rolling deployment, requests can hit either version, but that turned out to be fine. Each pod just searches its own index and returns valid results. Once the deployment wraps up and all pods are running v2, you can clean up the old index.

Here's what actually happens during a deployment:

sequenceDiagram
    participant Dev as Developer
    participant CI as CI/CD Pipeline
    participant K8s as Kubernetes
    participant App as Application Pod
    participant ES as Elasticsearch
    
    Dev->>CI: Deploy Application v1
    CI->>K8s: Create deployment
    K8s->>App: Start v1 pods
    App->>ES: Check if v1 index exists
    alt Index doesn't exist
        App->>ES: Build v1 index
        ES-->>App: Index ready
    end
    App-->>K8s: v1 Active
    K8s-->>CI: Deployment successful
    CI-->>Dev: v1 deployed successfully
    
    Dev->>CI: Schema change detected
    CI->>CI: Build v2 image
    CI->>K8s: Start rolling update
    K8s->>App: Deploy v2 pods
    App->>ES: Check if v2 index exists
    alt v2 Index doesn't exist
        App->>ES: Build v2 index
        ES-->>App: v2 Index ready
    else v2 Index already exists
        App->>ES: Use existing v2 index
    end
    App-->>K8s: v2 pods ready
    K8s->>App: Terminate v1 pods
    App-->>K8s: v1 pods terminated
    K8s-->>CI: v2 Active
    
    alt Issue detected
        K8s->>CI: Execute rollback
        CI->>K8s: Restore v1
        K8s->>App: Start v1 pods
        App-->>K8s: v1 restored
        K8s-->>CI: Rollback complete
        CI-->>Dev: Rollback successful
    else Successful deployment
        App->>ES: Remove old v1 index
        ES-->>App: Cleanup complete
        K8s-->>CI: Deployment complete
        CI-->>Dev: v2 deployed successfully
    end

Implementation Details

Version Configuration

First thing, add a version setting to your config:

app:
  search:
    index-version: v1
  
hibernate:
  search:
    backend:
      hosts: elasticsearch:9200
      protocol: https

When you need schema changes, bump it to v2. We started with v1 but you could use whatever naming scheme works for you.

Custom Index Layout Strategy

The real trick here is getting Hibernate Search to automatically tack on the version to your index names. We had to write a custom IndexLayoutStrategy for this:

@Component
public class VersionedIndexLayoutStrategy implements IndexLayoutStrategy {
    
    @Value("${app.search.index-version}")
    private String indexVersion;
    
    @Override
    public String createInitialElasticsearchIndexName(String hibernateSearchIndexName) {
        return hibernateSearchIndexName + "_" + indexVersion;
    }
    
    @Override
    public String createWriteAlias(String hibernateSearchIndexName) {
        return hibernateSearchIndexName + "_write_" + indexVersion;
    }
    
    @Override
    public String createReadAlias(String hibernateSearchIndexName) {
        return hibernateSearchIndexName + "_read";
    }
    
    @Override
    public String extractUniqueKeyFromHibernateSearchIndexName(String hibernateSearchIndexName) {
        return hibernateSearchIndexName + "_" + indexVersion;
    }
    
    @Override
    public String extractUniqueKeyFromElasticsearchIndexName(String elasticsearchIndexName) {
        return elasticsearchIndexName;
    }
}

Hibernate Search Configuration

Hook this into your Hibernate Search config:

@Configuration
public class ElasticsearchConfig {
    
    @Autowired
    private VersionedIndexLayoutStrategy versionedIndexLayoutStrategy;
    
    @Bean
    public HibernateSearchElasticsearchConfigurer hibernateSearchConfigurer() {
        return context -> {
            ElasticsearchBackendConfiguration backendConfig = context.backend();
            backendConfig.layout().strategy(versionedIndexLayoutStrategy);
        };
    }
}

Handling Pod Coordination

Here's where it gets interesting. During rolling deployments, you might have multiple new pods spinning up. You really don't want each one trying to build the same index. That's just wasteful and can cause weird race conditions.

We figured out a way to use Elasticsearch's _meta field as a coordination mechanism. Basically the first pod to start building marks the index as "already built" and the others skip it.

@Service
public class IndexBuildService {
    
    @Autowired
    private SearchSession searchSession;
    
    @Autowired
    private ElasticsearchClient elasticsearchClient;
    
    @Value("${app.search.index-version}")
    private String indexVersion;
    
    private static final Logger log = LoggerFactory.getLogger(IndexBuildService.class);
    
    @EventListener
    public void onContextRefresh(ContextRefreshedEvent event) {
        buildIndexIfNeeded();
    }
    
    private void buildIndexIfNeeded() {
        String indexName = "products_" + indexVersion;
        
        try {
            if (isIndexAlreadyBuilt(indexName)) {
                log.info("Index {} already built, skipping", indexName);
                return;
            }
            
            log.info("Building index {} - this takes a few minutes", indexName);
            
            searchSession.massIndexer(Product.class)
                .purgeAllOnStart(true)
                .typesToIndexInParallel(1)
                .batchSizeToLoadObjects(100)
                .threadsToLoadObjects(4)
                .idFetchSize(1000)
                .startAndWait();
            
            markIndexAsBuilt(indexName);
            log.info("Done building index {}", indexName);
            
        } catch (Exception e) {
            log.error("Index build failed for {}: {}", indexName, e.getMessage());
            throw new RuntimeException("Index build failed", e);
        }
    }
    
    private boolean isIndexAlreadyBuilt(String indexName) {
        try {
            boolean exists = elasticsearchClient.indices().exists(
                ExistsRequest.of(e -> e.index(indexName))
            ).value();
            
            if (!exists) {
                return false;
            }
            
            var mappingResponse = elasticsearchClient.indices().getMapping(
                GetMappingRequest.of(g -> g.index(indexName))
            );
            
            var mapping = mappingResponse.result().get(indexName);
            if (mapping != null && mapping.mappings() != null) {
                var meta = mapping.mappings().meta();
                if (meta != null && meta.containsKey("index_built")) {
                    return "true".equals(meta.get("index_built").toString());
                }
            }
            
            return false;
            
        } catch (Exception e) {
            log.warn("Couldn't check build status for {}: {}", indexName, e.getMessage());
            // When in doubt, assume it's not built and let this pod try
            return false;
        }
    }
    
    private void markIndexAsBuilt(String indexName) {
        try {
            Map<String, JsonData> metaData = Map.of(
                "index_built", JsonData.of("true"),
                "built_at", JsonData.of(Instant.now().toString())
            );
            
            elasticsearchClient.indices().putMapping(PutMappingRequest.of(p -> p
                .index(indexName)
                .meta(metaData)
            ));
            
            log.info("Marked {} as built", indexName);
            
        } catch (Exception e) {
            log.error("Failed to mark index as built: {}", e.getMessage());
        }
    }
}

Entity Classes Remain Unchanged

Your entity classes stay exactly the same:

@Entity
@Indexed(index = "products")
public class Product {
    
    @Id
    @DocumentId
    private Long id;
    
    @FullTextField(analyzer = "standard")
    private String name;
    
    @FullTextField(analyzer = "keyword")  // Added this field in v2
    private String category;
    
    @KeywordField
    private String status;
    
    // getters/setters...
}

Search Service Also Stays Simple

Your search logic doesn't need to know about versioning:

@Service
public class ProductSearchService {
    
    @Autowired
    private SearchSession searchSession;
    
    public List<Product> searchProducts(String query) {
        return searchSession.search(Product.class)
            .where(f -> f.bool()
                .should(f.match().field("name").matching(query))
                .should(f.match().field("category").matching(query)))
            .fetchHits(20);
    }
}

The IndexLayoutStrategy handles routing to the right versioned index automatically.

Lessons Learned

Index Building Takes Time

For our ~200k documents index, indexing usually takes 5-6 minutes.

Memory Is Important

Mass indexing uses a lot of memory. Ensure you have enough.

Cleanup Is Still Manual

Old indices just sit there until you decide to delete them, assuming you're confident there's no need for a rollback.

Rollbacks Work Well

If you need to rollback a deployment, the old pods come back up and use their original index. Works smoothly.

Our Deployment Process

Our process isn't anything fancy:

Bump the version number in the config file and make schema changes
Test on dev and pre-prod environments
Watch startup logs to ensure the index builds properly
Run test queries to verify search functionality
Clean up old indices

Why This Approach Works for Us

The biggest win is not having to think about it much. Once it's set up, deployments just work. No coordination between services, no background processes to monitor, no complex rollback procedures when something goes sideways.

Storage costs a bit more since you have duplicate indices sitting around temporarily, but dealing with a complex deployment system would cost us way more in engineering time. Plus our Elasticsearch cluster has plenty of space anyway.

Alternatives We Considered

Blue-green deployments came up in our research, but running duplicate environments just for search deployments seemed excessive.

The reindex API seemed promising at first, but when we tested it with our full dataset it was taking forever. Maybe it would work for smaller indices, but not practical for our use case.

Alias switching with background reindexing looked clever in theory, but the error handling got complicated fast. What happens if the reindex fails halfway through? What if the new mapping is incompatible? Too many edge cases.

Final Thoughts

This approach has been working for us for about 8 months now. It's not the most elegant solution. You end up with temporary duplicate indices and manual cleanup, but it's reliable and easy to understand.

The key insight for us was realising that simple solutions are often better than clever ones. Instead of fighting against how Kubernetes rolling deployments work, we designed something that works with them naturally.

Your situation might be different. If you have massive indices or really tight storage constraints, this approach might not work. But for most apps, the simplicity is worth the extra storage cost.

If you're dealing with similar problems, give this a try. The code isn't too complex, and once it's working, you can mostly forget about it.