In the current AI world, search is not just a feature; it’s the core of how we interact with information. But have you ever searched for a concept, and end up frustrated when the results focus on your exact keywords and  miss the actual meaning of the keywords?  Example, a search for "tips for new dog owners" might miss a great article titled "A Guide to Your First Canine Companion." This is the classic limitation of traditional keyword search. The solution isn't to abandon keywords but to enhance them. Hybrid Search, a state-of-the-art modern technique that delivers the best of both worlds.  Hybrid Search includes the precision of keyword matching and the contextual understanding of modern AI. Hybrid Search This article will  walk you through not just the what and why, but the how, with a complete, hands-on implementation using the open-source vector database Milvus. what why how Milvus The Two Worlds of Search: Lexical vs. Semantic Imagine you are searching for  “fast running shoes” in e-commerce site. A traditional search  will list the results matching “shoe”, “running” &  “fast” the product name instantly. But this search will miss the products with words “sneakers” or products described as “swift”, “quick”, “athletic footwear” etc. Keyword Search (Lexical): It is great for finding exact terms and specific entities (like names or product codes). It works by matching the text itself, often using algorithms like BM25. It’s reliable but lacks a deeper understanding.
Semantic Search (Vector):  It uses AI models to convert text into numerical representations called "vectors." These vectors capture the meaning and context of the words. This allows it to find conceptually similar results, even if the phrasing is completely different. Keyword Search (Lexical): It is great for finding exact terms and specific entities (like names or product codes). It works by matching the text itself, often using algorithms like BM25. It’s reliable but lacks a deeper understanding. Keyword Search (Lexical): BM25 BM25 Semantic Search (Vector):  It uses AI models to convert text into numerical representations called "vectors." These vectors capture the meaning and context of the words. This allows it to find conceptually similar results, even if the phrasing is completely different. Semantic Search (Vector): meaning meaning and context context Hybrid search doesn't force you to choose between Lexical and Semantic. It brings them together, creating a search experience that is both precise and context-aware. Hybrid Search delivers far more relevant results. Hybrid search Toolkit for Building Hybrid Search Before we start building, let us gather our tools: A Milvus Instance: Milvus is our vector database, the specialized library where we'll store and query our text's "meaning." You can run it locally, self-host it, or use the fully-managed Zilliz Cloud.
Python: The programming language we'll use.
pymilvus library : The official Python SDK for talking to Milvus.

pip install pymilvus


An Embedding Model: This is the AI that acts as our translator, which turns text into vectors. For hybrid search, we need a model that can create both dense vectors (for semantic meaning) and sparse vectors (for lexical keywords). A modern model like BGE-M3 can do both, or you can use separate models. A Milvus Instance: Milvus is our vector database, the specialized library where we'll store and query our text's "meaning." You can run it locally, self-host it, or use the fully-managed Zilliz Cloud. A Milvus Instance: A Milvus Instance A Milvus Instance Python: The programming language we'll use. Python pymilvus library : The official Python SDK for talking to Milvus.

pip install pymilvus pymilvus library pymilvus library pymilvus library pip install pymilvus pip install pymilvus pip install pymilvus An Embedding Model: This is the AI that acts as our translator, which turns text into vectors. For hybrid search, we need a model that can create both dense vectors (for semantic meaning) and sparse vectors (for lexical keywords). A modern model like BGE-M3 can do both, or you can use separate models. An Embedding Model: BGE-M3 BGE-M3 Step-by-Step Implementation Guide Step 1: Define a Multi-Vector Schema Every database needs a blueprint for the data it stores. In Milvus, this is called a schema. For hybrid search, our blueprint needs to specify fields for our text, its dense (semantic) vector, and its sparse (lexical) vector. from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections
import pymilvus

# Connect to Milvus instance (set host as needed)
connections.connect("default", host='localhost', port='19530')

# 1. Define Fields
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True)
text_field = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=2048)

# Dense vector field (e.g., 768 dimensions for BGE models)
dense_vector_field = FieldSchema(name="dense_vector", dtype=DataType.FLOAT_VECTOR, dim=768)

# Sparse vector field (for Splade/BM25-style sparse representations)
sparse_vector_field = FieldSchema(name="sparse_vector", dtype=DataType.SPARSE_FLOAT_VECTOR)

# 2. Define the Schema
schema = CollectionSchema(
    fields=[id_field, text_field, dense_vector_field, sparse_vector_field],
    description="Collection for hybrid search implementation"
)

# 3. Create the Collection
collection_name = "hybrid_search_articles"
collection = Collection(name=collection_name, schema=schema)

print(f"Collection '{collection_name}' created successfully.") from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections
import pymilvus

# Connect to Milvus instance (set host as needed)
connections.connect("default", host='localhost', port='19530')

# 1. Define Fields
id_field = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True)
text_field = FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=2048)

# Dense vector field (e.g., 768 dimensions for BGE models)
dense_vector_field = FieldSchema(name="dense_vector", dtype=DataType.FLOAT_VECTOR, dim=768)

# Sparse vector field (for Splade/BM25-style sparse representations)
sparse_vector_field = FieldSchema(name="sparse_vector", dtype=DataType.SPARSE_FLOAT_VECTOR)

# 2. Define the Schema
schema = CollectionSchema(
    fields=[id_field, text_field, dense_vector_field, sparse_vector_field],
    description="Collection for hybrid search implementation"
)

# 3. Create the Collection
collection_name = "hybrid_search_articles"
collection = Collection(name=collection_name, schema=schema)

print(f"Collection '{collection_name}' created successfully.") Step 2: Create Specialized Indexes If a schema is a blueprint, an index is the super-fast table of contents. To get optimal performance, we need to tell Milvus how to organize our different vector types. Dense Vectors use Approximate Nearest Neighbor (ANN) indexes. AUTOINDEX is a great choice where Milvus picks the best one for you.
Sparse Vectors have their own special index type, SPARSE_INVERTED_INDEX. Dense Vectors use Approximate Nearest Neighbor (ANN) indexes. AUTOINDEX is a great choice where Milvus picks the best one for you. Dense Vectors ANN AUTOINDEX Sparse Vectors have their own special index type, SPARSE_INVERTED_INDEX. Sparse Vectors SPARSE_INVERTED_INDEX. # Create index for the dense vector field
dense_index_params = {
    "index_type": "AUTOINDEX",
    "metric_type": "COSINE", # Common metric for semantic search
    "params": {}
}
collection.create_index("dense_vector", dense_index_params)

# Create index for the sparse vector field
sparse_index_params = {
    "index_type": "SPARSE_INVERTED_INDEX",
    "metric_type": "IP", # Inner Product is standard for sparse vectors
    "params": {}
}
collection.create_index("sparse_vector", sparse_index_params)

print("Indexes created for dense and sparse fields.") # Create index for the dense vector field
dense_index_params = {
    "index_type": "AUTOINDEX",
    "metric_type": "COSINE", # Common metric for semantic search
    "params": {}
}
collection.create_index("dense_vector", dense_index_params)

# Create index for the sparse vector field
sparse_index_params = {
    "index_type": "SPARSE_INVERTED_INDEX",
    "metric_type": "IP", # Inner Product is standard for sparse vectors
    "params": {}
}
collection.create_index("sparse_vector", sparse_index_params)

print("Indexes created for dense and sparse fields.") Step 3: Insert Data (with AI-Generated Embeddings) Now we can move to populate our collection with data. We will take our text documents, use our embedding model to generate both dense and sparse vectors for each, and insert them into Milvus. The following code uses a mock function to generate vectors. In a real-world application, you would replace this with calls to your actual AI model. mock # This is for demo. You must generate these vectors using your specific AI models
def generate_mock_embeddings(texts):
    # In a real app, replace with calls to your model endpoint
    import random
    import numpy as np
    dense = [np.random.rand(768).tolist() for _ in texts]
    # Sparse vectors are dictionary representations of indices/values
    sparse = [{random.randint(0, 5000): random.random() for _ in range(10)} for _ in texts]
    return dense, sparse
# ----------------------------

texts = ["Milvus is a vector database.", "Hybrid search is powerful.", "Semantic search uses AI.", "Keyword search is traditional."]
dense_vecs, sparse_vecs = generate_mock_embeddings(texts)

data_to_insert = [
    {"text": t, "dense_vector": d, "sparse_vector": s}
    for t, d, s in zip(texts, dense_vecs, sparse_vecs)
]

collection.insert(data_to_insert)
collection.load() # Load collection into memory for searching
print(f"Inserted {len(data_to_insert)} records and loaded collection.") # This is for demo. You must generate these vectors using your specific AI models
def generate_mock_embeddings(texts):
    # In a real app, replace with calls to your model endpoint
    import random
    import numpy as np
    dense = [np.random.rand(768).tolist() for _ in texts]
    # Sparse vectors are dictionary representations of indices/values
    sparse = [{random.randint(0, 5000): random.random() for _ in range(10)} for _ in texts]
    return dense, sparse
# ----------------------------

texts = ["Milvus is a vector database.", "Hybrid search is powerful.", "Semantic search uses AI.", "Keyword search is traditional."]
dense_vecs, sparse_vecs = generate_mock_embeddings(texts)

data_to_insert = [
    {"text": t, "dense_vector": d, "sparse_vector": s}
    for t, d, s in zip(texts, dense_vecs, sparse_vecs)
]

collection.insert(data_to_insert)
collection.load() # Load collection into memory for searching
print(f"Inserted {len(data_to_insert)} records and loaded collection.") Step 4: Execute the Hybrid Search This is where the magic happens. We will  take a user  query, generate both dense and sparse vectors for it (aka inference), and then ask Milvus to perform two searches in parallel. Milvus then uses a reranker to intelligently fuse the two sets of results into a single, highly relevant list. The most common reranker is Reciprocal Rank Fusion (RRF), which smartly combines the rankings from both searches without needing complex manual tuning. Reciprocal Rank Fusion Reciprocal Rank Fusion from pymilvus import AnnSearchRequest, RRFRanker, WeightedRanker

# Assume we generate query vectors the same way we generated data vectors
query_text = "What is a vector database?"
# Use your models to get these vectors:
query_dense_vector, query_sparse_vector = generate_mock_embeddings([query_text])

# 1. Define the Dense Search Request
req_dense = AnnSearchRequest(
    data=query_dense_vector, # Your query vector(s)
    anns_field="dense_vector",
    param={"metric_type": "COSINE", "params": {"nprobe": 10}},
    limit=10 # Get top 10 from dense search
)

# 2. Define the Sparse Search Request
req_sparse = AnnSearchRequest(
    data=query_sparse_vector, # Your query sparse vector(s)
    anns_field="sparse_vector",
    param={"metric_type": "IP", "params": {}},
    limit=10 # Get top 10 from sparse search
)

# 3. Define the Reranker
# We use RRF which dynamically fuses rankings
rerank = RRFRanker()

# Optional: Use WeightedRanker if you want to explicitly bias towards semantic (0.7, 0.3)
# rerank = WeightedRanker(0.7, 0.3)

# 4. Execute the Hybrid Search
results = collection.hybrid_search(
    reqs=[req_dense, req_sparse],
    rerank=rerank,
    limit=5, # Final limit of results to return
    output_fields=["text"]
)

# 5. Process and display results
print("\nHydrid Search Results:")
for hit in results[0]: # results[0] because we provided one query vector
    print(f"ID: {hit.id} | Score (RRF): {hit.distance:.4f} | Text: {hit.entity.get('text')}") from pymilvus import AnnSearchRequest, RRFRanker, WeightedRanker

# Assume we generate query vectors the same way we generated data vectors
query_text = "What is a vector database?"
# Use your models to get these vectors:
query_dense_vector, query_sparse_vector = generate_mock_embeddings([query_text])

# 1. Define the Dense Search Request
req_dense = AnnSearchRequest(
    data=query_dense_vector, # Your query vector(s)
    anns_field="dense_vector",
    param={"metric_type": "COSINE", "params": {"nprobe": 10}},
    limit=10 # Get top 10 from dense search
)

# 2. Define the Sparse Search Request
req_sparse = AnnSearchRequest(
    data=query_sparse_vector, # Your query sparse vector(s)
    anns_field="sparse_vector",
    param={"metric_type": "IP", "params": {}},
    limit=10 # Get top 10 from sparse search
)

# 3. Define the Reranker
# We use RRF which dynamically fuses rankings
rerank = RRFRanker()

# Optional: Use WeightedRanker if you want to explicitly bias towards semantic (0.7, 0.3)
# rerank = WeightedRanker(0.7, 0.3)

# 4. Execute the Hybrid Search
results = collection.hybrid_search(
    reqs=[req_dense, req_sparse],
    rerank=rerank,
    limit=5, # Final limit of results to return
    output_fields=["text"]
)

# 5. Process and display results
print("\nHydrid Search Results:")
for hit in results[0]: # results[0] because we provided one query vector
    print(f"ID: {hit.id} | Score (RRF): {hit.distance:.4f} | Text: {hit.entity.get('text')}") Best Practices for Success with Hybrid Search Implementing the code is just the beginning. To build a truly exceptional search experience, follow these best practices. Data and Vector Generation Data and Vector Generation Align Your Models: The AI models used to embed your documents must be the same models you use to embed your queries. A mismatch is like the two librarians speaking different languages.
Normalize Dense Vectors: For metrics like COSINE, normalizing your dense vectors (making their "length" equal to 1) before insertion can improve search accuracy and performance.
Use Proven Sparse Methods: Don't invent your own sparse vector generation. Rely on established lexical methods like BM25 or SPLADE to create meaningful, high-quality sparse representations. Align Your Models: The AI models used to embed your documents must be the same models you use to embed your queries. A mismatch is like the two librarians speaking different languages. Align Your Models: Normalize Dense Vectors: For metrics like COSINE, normalizing your dense vectors (making their "length" equal to 1) before insertion can improve search accuracy and performance. Normalize Dense Vectors: Use Proven Sparse Methods: Don't invent your own sparse vector generation. Rely on established lexical methods like BM25 or SPLADE to create meaningful, high-quality sparse representations. Use Proven Sparse Methods: BM25 SPLADE SPLADE Indexing and Infrastructure Indexing and Infrastructure Tune Index Parameters: While defaults are a good start, tuning index parameters (like nlist or M) based on your dataset size and desired speed vs. accuracy trade-off is crucial for production systems.
Leverage Scalar Filtering: Use the expr parameter in your search requests to pre-filter candidates based on metadata (e.g., category == "electronics" or publish_date > 2023). This dramatically speeds up queries by reducing the search space.
Monitor and Scale: Keep an eye on query latency and system metrics. As your data and traffic grow, be prepared to scale your Milvus cluster to maintain performance. Tune Index Parameters: While defaults are a good start, tuning index parameters (like nlist or M) based on your dataset size and desired speed vs. accuracy trade-off is crucial for production systems. Tune Index Parameters: Leverage Scalar Filtering: Use the expr parameter in your search requests to pre-filter candidates based on metadata (e.g., category == "electronics" or publish_date > 2023). This dramatically speeds up queries by reducing the search space. Leverage Scalar Filtering: Monitor and Scale: Keep an eye on query latency and system metrics. As your data and traffic grow, be prepared to scale your Milvus cluster to maintain performance. Monitor and Scale: Reranking Strategy Reranking Strategy Start with RRF: Reciprocal Rank Fusion (RRFRanker) is the best starting point for most use cases. It effectively balances results without manual weight-tuning.
Consider WeightedRanker for Control: If you have a strong reason to favor one search type over another (e.g., for e-commerce, you might want to give semantic search 70% weight and keywords 30%), use WeightedRanker.
Test and Iterate: The only way to know what's best is to test. Use real-world queries and user feedback to fine-tune your reranking strategy and parameters. Start with RRF: Reciprocal Rank Fusion (RRFRanker) is the best starting point for most use cases. It effectively balances results without manual weight-tuning. Start with RRF: Consider WeightedRanker for Control: If you have a strong reason to favor one search type over another (e.g., for e-commerce, you might want to give semantic search 70% weight and keywords 30%), use WeightedRanker. Consider WeightedRanker for Control: Test and Iterate: The only way to know what's best is to test. Use real-world queries and user feedback to fine-tune your reranking strategy and parameters. Test and Iterate: Summary By combining the strengths of lexical and semantic search, you can build an intelligent, intuitive, and highly effective search solution that understands user intent, not just keywords. You now have the blueprint and the code to implement it yourself. Happy building! References Milvus
BM25
BGE-M3
pymilvus Milvus Milvus BM25 BM25 BGE-M3 BGE-M3 pymilvus pymilvus

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

Arweave

Every

Scalar

Building Geographic Resilience: Innovative Patterns for Oracle Systems

Revolutionize Your Search with Hybrid Techniques: A Hands-On Guide

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Building Geographic Resilience: Innovative Patterns for Oracle Systems

Building Geographic Resilience: Innovative Patterns for Oracle Systems

I Built an AI Agent That Lets You Explore APIs in Plain English

The Geographic Imperative: How CockroachDB Turns Maps into Architecture

Vector Search: A Reranker Algorithm Showdown

Transform Search in 5 Minutes: AI-Powered Hybrid Cloud Search

Building Geographic Resilience: Innovative Patterns for Oracle Systems

Building Geographic Resilience: Innovative Patterns for Oracle Systems

I Built an AI Agent That Lets You Explore APIs in Plain English

The Geographic Imperative: How CockroachDB Turns Maps into Architecture

Vector Search: A Reranker Algorithm Showdown

Transform Search in 5 Minutes: AI-Powered Hybrid Cloud Search

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps