In the current AI world, search is not just a feature; it’s the core of how we interact with information. But have you ever searched for a concept, and end up frustrated when the results focus on your exact keywords and miss the actual meaning of the keywords? Example, a search for "tips for new dog owners" might miss a great article titled "A Guide to Your First Canine Companion." This is the classic limitation of traditional keyword search. The solution isn't to abandon keywords but to enhance them. Hybrid Search, a state-of-the-art modern technique that delivers the best of both worlds. Hybrid Search includes the precision of keyword matching and the contextual understanding of modern AI. Hybrid Search This article will walk you through not just the what and why, but the how, with a complete, hands-on implementation using the open-source vector database Milvus. what why how Milvus The Two Worlds of Search: Lexical vs. Semantic Imagine you are searching for “fast running shoes” in e-commerce site. A traditional search will list the results matching “shoe”, “running” & “fast” the product name instantly. But this search will miss the products with words “sneakers” or products described as “swift”, “quick”, “athletic footwear” etc. Keyword Search (Lexical): It is great for finding exact terms and specific entities (like names or product codes). It works by matching the text itself, often using algorithms like BM25. It’s reliable but lacks a deeper understanding. Semantic Search (Vector): It uses AI models to convert text into numerical representations called "vectors." These vectors capture the meaning and context of the words. This allows it to find conceptually similar results, even if the phrasing is completely different. Keyword Search (Lexical): It is great for finding exact terms and specific entities (like names or product codes). It works by matching the text itself, often using algorithms like BM25. It’s reliable but lacks a deeper understanding. Keyword Search (Lexical): BM25 BM25 Semantic Search (Vector): It uses AI models to convert text into numerical representations called "vectors." These vectors capture the meaning and context of the words. This allows it to find conceptually similar results, even if the phrasing is completely different. Semantic Search (Vector): meaning meaning and context context Hybrid search doesn't force you to choose between Lexical and Semantic. It brings them together, creating a search experience that is both precise and context-aware. Hybrid Search delivers far more relevant results. Hybrid search Toolkit for Building Hybrid Search Before we start building, let us gather our tools: A Milvus Instance: Milvus is our vector database, the specialized library where we'll store and query our text's "meaning." You can run it locally, self-host it, or use the fully-managed Zilliz Cloud. Python: The programming language we'll use. pymilvus library : The official Python SDK for talking to Milvus. pip install pymilvus An Embedding Model: This is the AI that acts as our translator, which turns text into vectors. For hybrid search, we need a model that can create both dense vectors (for semantic meaning) and sparse vectors (for lexical keywords). A modern model like BGE-M3 can do both, or you can use separate models. A Milvus Instance: Milvus is our vector database, the specialized library where we'll store and query our text's "meaning." You can run it locally, self-host it, or use the fully-managed Zilliz Cloud. A Milvus Instance: A Milvus Instance A Milvus Instance Python: The programming language we'll use. Python pymilvus library : The official Python SDK for talking to Milvus. pip install pymilvus pymilvus library pymilvus library pymilvus library pip install pymilvus pip install pymilvus pip install pymilvus An Embedding Model: This is the AI that acts as our translator, which turns text into vectors. For hybrid search, we need a model that can create both dense vectors (for semantic meaning) and sparse vectors (for lexical keywords). A modern model like BGE-M3 can do both, or you can use separate models. An Embedding Model: BGE-M3 BGE-M3 Step-by-Step Implementation Guide Step 1: Define a Multi-Vector Schema Every database needs a blueprint for the data it stores. In Milvus, this is called a schema. For hybrid search, our blueprint needs to specify fields for our text, its dense (semantic) vector, and its sparse (lexical) vector. from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections import pymilvus # Connect to Milvus instance (set host as needed) connections.connect("default", host='localhost', port='19530') # 1. Define Fields id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True) text_field = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=2048) # Dense vector field (e.g., 768 dimensions for BGE models) dense_vector_field = FieldSchema(name="dense_vector", dtype=DataType.FLOAT_VECTOR, dim=768) # Sparse vector field (for Splade/BM25-style sparse representations) sparse_vector_field = FieldSchema(name="sparse_vector", dtype=DataType.SPARSE_FLOAT_VECTOR) # 2. Define the Schema schema = CollectionSchema( fields=[id_field, text_field, dense_vector_field, sparse_vector_field], description="Collection for hybrid search implementation" ) # 3. Create the Collection collection_name = "hybrid_search_articles" collection = Collection(name=collection_name, schema=schema) print(f"Collection '{collection_name}' created successfully.") from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections import pymilvus # Connect to Milvus instance (set host as needed) connections.connect("default", host='localhost', port='19530') # 1. Define Fields id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True) text_field = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=2048) # Dense vector field (e.g., 768 dimensions for BGE models) dense_vector_field = FieldSchema(name="dense_vector", dtype=DataType.FLOAT_VECTOR, dim=768) # Sparse vector field (for Splade/BM25-style sparse representations) sparse_vector_field = FieldSchema(name="sparse_vector", dtype=DataType.SPARSE_FLOAT_VECTOR) # 2. Define the Schema schema = CollectionSchema( fields=[id_field, text_field, dense_vector_field, sparse_vector_field], description="Collection for hybrid search implementation" ) # 3. Create the Collection collection_name = "hybrid_search_articles" collection = Collection(name=collection_name, schema=schema) print(f"Collection '{collection_name}' created successfully.") Step 2: Create Specialized Indexes If a schema is a blueprint, an index is the super-fast table of contents. To get optimal performance, we need to tell Milvus how to organize our different vector types. Dense Vectors use Approximate Nearest Neighbor (ANN) indexes. AUTOINDEX is a great choice where Milvus picks the best one for you. Sparse Vectors have their own special index type, SPARSE_INVERTED_INDEX. Dense Vectors use Approximate Nearest Neighbor (ANN) indexes. AUTOINDEX is a great choice where Milvus picks the best one for you. Dense Vectors ANN AUTOINDEX Sparse Vectors have their own special index type, SPARSE_INVERTED_INDEX. Sparse Vectors SPARSE_INVERTED_INDEX. # Create index for the dense vector field dense_index_params = { "index_type": "AUTOINDEX", "metric_type": "COSINE", # Common metric for semantic search "params": {} } collection.create_index("dense_vector", dense_index_params) # Create index for the sparse vector field sparse_index_params = { "index_type": "SPARSE_INVERTED_INDEX", "metric_type": "IP", # Inner Product is standard for sparse vectors "params": {} } collection.create_index("sparse_vector", sparse_index_params) print("Indexes created for dense and sparse fields.") # Create index for the dense vector field dense_index_params = { "index_type": "AUTOINDEX", "metric_type": "COSINE", # Common metric for semantic search "params": {} } collection.create_index("dense_vector", dense_index_params) # Create index for the sparse vector field sparse_index_params = { "index_type": "SPARSE_INVERTED_INDEX", "metric_type": "IP", # Inner Product is standard for sparse vectors "params": {} } collection.create_index("sparse_vector", sparse_index_params) print("Indexes created for dense and sparse fields.") Step 3: Insert Data (with AI-Generated Embeddings) Now we can move to populate our collection with data. We will take our text documents, use our embedding model to generate both dense and sparse vectors for each, and insert them into Milvus. The following code uses a mock function to generate vectors. In a real-world application, you would replace this with calls to your actual AI model. mock # This is for demo. You must generate these vectors using your specific AI models def generate_mock_embeddings(texts): # In a real app, replace with calls to your model endpoint import random import numpy as np dense = [np.random.rand(768).tolist() for _ in texts] # Sparse vectors are dictionary representations of indices/values sparse = [{random.randint(0, 5000): random.random() for _ in range(10)} for _ in texts] return dense, sparse # ---------------------------- texts = ["Milvus is a vector database.", "Hybrid search is powerful.", "Semantic search uses AI.", "Keyword search is traditional."] dense_vecs, sparse_vecs = generate_mock_embeddings(texts) data_to_insert = [ {"text": t, "dense_vector": d, "sparse_vector": s} for t, d, s in zip(texts, dense_vecs, sparse_vecs) ] collection.insert(data_to_insert) collection.load() # Load collection into memory for searching print(f"Inserted {len(data_to_insert)} records and loaded collection.") # This is for demo. You must generate these vectors using your specific AI models def generate_mock_embeddings(texts): # In a real app, replace with calls to your model endpoint import random import numpy as np dense = [np.random.rand(768).tolist() for _ in texts] # Sparse vectors are dictionary representations of indices/values sparse = [{random.randint(0, 5000): random.random() for _ in range(10)} for _ in texts] return dense, sparse # ---------------------------- texts = ["Milvus is a vector database.", "Hybrid search is powerful.", "Semantic search uses AI.", "Keyword search is traditional."] dense_vecs, sparse_vecs = generate_mock_embeddings(texts) data_to_insert = [ {"text": t, "dense_vector": d, "sparse_vector": s} for t, d, s in zip(texts, dense_vecs, sparse_vecs) ] collection.insert(data_to_insert) collection.load() # Load collection into memory for searching print(f"Inserted {len(data_to_insert)} records and loaded collection.") Step 4: Execute the Hybrid Search This is where the magic happens. We will take a user query, generate both dense and sparse vectors for it (aka inference), and then ask Milvus to perform two searches in parallel. Milvus then uses a reranker to intelligently fuse the two sets of results into a single, highly relevant list. The most common reranker is Reciprocal Rank Fusion (RRF), which smartly combines the rankings from both searches without needing complex manual tuning. Reciprocal Rank Fusion Reciprocal Rank Fusion from pymilvus import AnnSearchRequest, RRFRanker, WeightedRanker # Assume we generate query vectors the same way we generated data vectors query_text = "What is a vector database?" # Use your models to get these vectors: query_dense_vector, query_sparse_vector = generate_mock_embeddings([query_text]) # 1. Define the Dense Search Request req_dense = AnnSearchRequest( data=query_dense_vector, # Your query vector(s) anns_field="dense_vector", param={"metric_type": "COSINE", "params": {"nprobe": 10}}, limit=10 # Get top 10 from dense search ) # 2. Define the Sparse Search Request req_sparse = AnnSearchRequest( data=query_sparse_vector, # Your query sparse vector(s) anns_field="sparse_vector", param={"metric_type": "IP", "params": {}}, limit=10 # Get top 10 from sparse search ) # 3. Define the Reranker # We use RRF which dynamically fuses rankings rerank = RRFRanker() # Optional: Use WeightedRanker if you want to explicitly bias towards semantic (0.7, 0.3) # rerank = WeightedRanker(0.7, 0.3) # 4. Execute the Hybrid Search results = collection.hybrid_search( reqs=[req_dense, req_sparse], rerank=rerank, limit=5, # Final limit of results to return output_fields=["text"] ) # 5. Process and display results print("\nHydrid Search Results:") for hit in results[0]: # results[0] because we provided one query vector print(f"ID: {hit.id} | Score (RRF): {hit.distance:.4f} | Text: {hit.entity.get('text')}") from pymilvus import AnnSearchRequest, RRFRanker, WeightedRanker # Assume we generate query vectors the same way we generated data vectors query_text = "What is a vector database?" # Use your models to get these vectors: query_dense_vector, query_sparse_vector = generate_mock_embeddings([query_text]) # 1. Define the Dense Search Request req_dense = AnnSearchRequest( data=query_dense_vector, # Your query vector(s) anns_field="dense_vector", param={"metric_type": "COSINE", "params": {"nprobe": 10}}, limit=10 # Get top 10 from dense search ) # 2. Define the Sparse Search Request req_sparse = AnnSearchRequest( data=query_sparse_vector, # Your query sparse vector(s) anns_field="sparse_vector", param={"metric_type": "IP", "params": {}}, limit=10 # Get top 10 from sparse search ) # 3. Define the Reranker # We use RRF which dynamically fuses rankings rerank = RRFRanker() # Optional: Use WeightedRanker if you want to explicitly bias towards semantic (0.7, 0.3) # rerank = WeightedRanker(0.7, 0.3) # 4. Execute the Hybrid Search results = collection.hybrid_search( reqs=[req_dense, req_sparse], rerank=rerank, limit=5, # Final limit of results to return output_fields=["text"] ) # 5. Process and display results print("\nHydrid Search Results:") for hit in results[0]: # results[0] because we provided one query vector print(f"ID: {hit.id} | Score (RRF): {hit.distance:.4f} | Text: {hit.entity.get('text')}") Best Practices for Success with Hybrid Search Implementing the code is just the beginning. To build a truly exceptional search experience, follow these best practices. Data and Vector Generation Data and Vector Generation Align Your Models: The AI models used to embed your documents must be the same models you use to embed your queries. A mismatch is like the two librarians speaking different languages. Normalize Dense Vectors: For metrics like COSINE, normalizing your dense vectors (making their "length" equal to 1) before insertion can improve search accuracy and performance. Use Proven Sparse Methods: Don't invent your own sparse vector generation. Rely on established lexical methods like BM25 or SPLADE to create meaningful, high-quality sparse representations. Align Your Models: The AI models used to embed your documents must be the same models you use to embed your queries. A mismatch is like the two librarians speaking different languages. Align Your Models: Normalize Dense Vectors: For metrics like COSINE, normalizing your dense vectors (making their "length" equal to 1) before insertion can improve search accuracy and performance. Normalize Dense Vectors: Use Proven Sparse Methods: Don't invent your own sparse vector generation. Rely on established lexical methods like BM25 or SPLADE to create meaningful, high-quality sparse representations. Use Proven Sparse Methods: BM25 SPLADE SPLADE Indexing and Infrastructure Indexing and Infrastructure Tune Index Parameters: While defaults are a good start, tuning index parameters (like nlist or M) based on your dataset size and desired speed vs. accuracy trade-off is crucial for production systems. Leverage Scalar Filtering: Use the expr parameter in your search requests to pre-filter candidates based on metadata (e.g., category == "electronics" or publish_date > 2023). This dramatically speeds up queries by reducing the search space. Monitor and Scale: Keep an eye on query latency and system metrics. As your data and traffic grow, be prepared to scale your Milvus cluster to maintain performance. Tune Index Parameters: While defaults are a good start, tuning index parameters (like nlist or M) based on your dataset size and desired speed vs. accuracy trade-off is crucial for production systems. Tune Index Parameters: Leverage Scalar Filtering: Use the expr parameter in your search requests to pre-filter candidates based on metadata (e.g., category == "electronics" or publish_date > 2023). This dramatically speeds up queries by reducing the search space. Leverage Scalar Filtering: Monitor and Scale: Keep an eye on query latency and system metrics. As your data and traffic grow, be prepared to scale your Milvus cluster to maintain performance. Monitor and Scale: Reranking Strategy Reranking Strategy Start with RRF: Reciprocal Rank Fusion (RRFRanker) is the best starting point for most use cases. It effectively balances results without manual weight-tuning. Consider WeightedRanker for Control: If you have a strong reason to favor one search type over another (e.g., for e-commerce, you might want to give semantic search 70% weight and keywords 30%), use WeightedRanker. Test and Iterate: The only way to know what's best is to test. Use real-world queries and user feedback to fine-tune your reranking strategy and parameters. Start with RRF: Reciprocal Rank Fusion (RRFRanker) is the best starting point for most use cases. It effectively balances results without manual weight-tuning. Start with RRF: Consider WeightedRanker for Control: If you have a strong reason to favor one search type over another (e.g., for e-commerce, you might want to give semantic search 70% weight and keywords 30%), use WeightedRanker. Consider WeightedRanker for Control: Test and Iterate: The only way to know what's best is to test. Use real-world queries and user feedback to fine-tune your reranking strategy and parameters. Test and Iterate: Summary By combining the strengths of lexical and semantic search, you can build an intelligent, intuitive, and highly effective search solution that understands user intent, not just keywords. You now have the blueprint and the code to implement it yourself. Happy building! References Milvus BM25 BGE-M3 pymilvus Milvus Milvus BM25 BM25 BGE-M3 BGE-M3 pymilvus pymilvus