What You'll Learn
- Part 1: Evaluating vector databases against real-world filtering requirements
- Part 2: Step-by-step implementation for RAG
Part 1: The Vector Database Evaluation Journey
You've built a chatbot powered by a sophisticated large language model (LLM). It is incredibly confident and impresses everyone in the demo. Then a customer asks:
"Can I change my shipping address after placing an order?"
Your chatbot confidently responds:
"No, shipping addresses cannot be changed once an order is placed. You'll need to cancel and reorder."
Oops.
Actually, your policy allows address changes within 2 hours of ordering. One frustrated customer, who just posted a 1-star review about your 'broken chatbot'. Multiply that by the 50 others who will ask the same question. This gets even riskier with pricing.
Imagine your chatbot confidently telling customers:
"Our product A costs $99/month," when you actually changed it to $79/month last quarter.
The Search for a Reliable Solution
When I set out to build a reliable chatbot for our customer service, I evaluated several approaches:
Option 1: Fine-Tuning - Too expensive, needs constant updating
Option 2: Bigger models - Higher costs with still-outdated knowledge
Option 3: RAG (Retrieval-Augmented Generation) - Promising, but with a critical catch.
RAG promised to tackle three core challenges of LLMs:
- Hallucination: Plausible-sounding information, but fabricated information
- Static Knowledge: The knowledge of an LLM is often frozen in time.
- Compute Cost: Extremely high, and requires extensive GPU time, eating your budget
The question isn't whether to enhance your chatbot, but how to do it cost-effectively.
The Vector Database Dilemma
RAG promised to solve our problems, but introduced a new dilemma: "Which vector database should we use with our customer interactions?"
Our chatbot needs to handle queries like:
- What is the return policy for electronics under $100?
- Can I upgrade shipping for orders over $200?
- I want to change my shipping address as I ordered last night.
- Show me laptops between $800-$1200 with 16GB RAM?
- Find laptops under $500?
Notice the pattern? Real user queries combine numerical filtering with semantic understanding. The goal isn’t just to find similar texts, but to surface relevant information within specific constraints.
When we analysed our conversation logs, we found that queries included price, updated promotion news, and policies. That's when it hit me: Semantic search alone isn't enough for real business logic. This wasn't an edge case; it was core functionality.
Our vector database needed:
- Semantic understanding (What are they asking?)
- Numerical Filtering (Within what limits?)
- Both together, seamlessly
The Vector Database Shortlist
Four options emerged as serious contenders:
- ChromaDB — the simplest option, optimized for ease of use
- Pinecone — a fully managed solution with no infrastructure overhead
- Milvus — the clear choice for large-scale deployments
- Weaviate — a flexible platform with multiple hosting options
The question wasn't which was "best" in theory - but which was right for our specific, filter-heavy, production-ready chatbot. Let's have a glance over setup.
Vector Database Setup Comparison
| Feature | ChromaDB | Pinecone | Weaviate | Milvus |
| Installation | pip install chromadb | pip install pinecone-client | pip install weaviate-client | pip install pymilvus |
| Quick Start | Instant | API key only | Docker or cloud | Docker Compose |
| Setup Time | 5 minutes | 5-10 minutes | 3-5 minutes for cloud 20 minutes for Docker | 45+ minutes |
| Infrastructure | None needed | None needed | Docker/K8s/Cloud | Docker Compose / K8s |
| Free Tier Duration | Forever | 30 days | 14 days (cloud) | Trial (Zilliz) |
| After Free Tier | Still free | Pay or delete | Pay or self-host | Pay or self-host |
| Local Development | Excellent | Cloud-only | Docker simple | Complex |
| Learning Curve | Easy | Easy | Medium | Hard |
Query Comparison
We have seen the setup, let's explore the code structure, which could tell usthe true story. Let's see how each database handles the exact queries our customers ask.
Test Query 1: Give me laptops that are below $500
1. Weaviate
result = products.query.near_text(
query="laptop",
filters=Filter.by_property("price").less_than(500),
limit=10
)
result.objects[0].properties['name']
2. Pinecone
result=index.query(vector=get_embedding("laptop", filter={"price":})vector=get_embedding("laptop"),
filter={"price": {"$lt": 500}},
top_k=10,
include_metadata=True
)
result['matches'][0]['metadata']['name']
3. ChromaDB
result = collection.query(
query_texts=["laptop"],
where={"price": {"$lt": 500}},
n_results=10
)
result['metadatas'][0][0]['name']
4. Milvus
result = collection.search(
data=[get_embedding("laptop")],
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
expr="price < 500",
limit=10,
output_fields=["name", "price"]
)
result[0][0].entity.get('name')
Let's examine what happens with a bit more complex query: "Show me laptops between $800-$1200 with 16GB RAM."
1. Weaviate
result = products.query.hybrid(
query="laptop 16GB RAM",
filters=(
Filter.by_property("price").greater_or_equal(800) &
Filter.by_property("price").less_or_equal(1200) &
Filter.by_property("ram").equal("16GB")
),
limit=10
)
# Access results
for product in result.objects:
print(f"{product.properties['name']}: ${product.properties['price']}")
2.PineCone
result = index.query(
vector=get_embedding("laptop 16GB RAM"),
filter={
"price": {"$gte": 800, "$lte": 1200},
"ram": {"$eq": "16GB"}
},
top_k=10,
include_metadata=True
)
# Access results
for match in result['matches']:
print(f"{match['metadata']['name']}: ${match['metadata']['price']}")
3.ChromaDB
result = collection.query(
query_texts=["laptop 16GB RAM"],
where={
"$and": [
{"price": {"$gte": 800}},
{"price": {"$lte": 1200}},
{"ram": {"$eq": "16GB"}}
]
},
n_results=10
)
# Access results
for i, meta in enumerate(result['metadatas'][0]):
print(f"{meta['name']}: ${meta['price']}")
4.Milvus
result = collection.search(
data=[get_embedding("laptop 16GB RAM")],
anns_field="embedding",
param={"metric_type": "L2", "params": {"nprobe": 10}},
expr='price >= 800 && price <= 1200 && ram == "16GB"',
limit=10,
output_fields=["name", "price", "ram"]
)
# Access results
for hits in result:
for hit in hits:
print(f"{hit.entity.get('name')}: ${hit.entity.get('price')}")
Filter Syntax in Practice: Where developer experience meets production reality
| Database | Type Safety | Readability | Developer Experience | Best for |
| Weaviate | Compile-time validation (client-side) | Excellent - Reads like natural language queries | Clean method chaining, intuitive API | Complex business logic with multiple filter conditions |
| Pinecone | Runtime validation (server-side) | Good - JSON/dictionary syntax | Simple dictionary-based filters | With managed infrastructure (zero ops) |
| ChromaDB | No validation(client-side only) | Okay - With nested dictionary structure | Dictionary-based syntax, minimal learning curve | Prototype and MVPs without complex filtering |
| Milvus | Runtime only(string parsing) | Complex - String expressions | String-based expressions, error-prone | High-performance, large-scale deployments |
The Decision: How We Found Our Perfect Match
After weeks of evaluation, technical deep-dives, and real prototyping, we arrived at a clear winner. But this wasn't about finding the "best" vector database—it was about finding the right partner for our specific journey.
Our Non-Negotiables: The Filter That Filtered Our Options
We built our decision framework around what truly mattered for our team, timeline, and business goals:
1. Week-1 Prototyping: We needed working code in 7 days, not 7 weeks
2. Future Self-Hosting: Cloud today, on-prem tomorrow without API changes
3. Zero-Cost Experimentation: Test ideas without budget approvals
4. Developer-First Experience: No DevOps PhD required
5. Complex Filtering + Hybrid Search: Our chatbot's core competency
6. Clean, Predictable Results: No black-box scoring mysteries
Why It Felt Like Finding "The One"
| Our Anxiety | Weaviate's answer |
| We'll get stuck in DevOps hell | Single Docker container or cloud instance |
| Our prototype will take months | Working in hours, production-ready in days |
| Filtering will be hacky and slow | Native, optimized filtering during search |
| We'll outgrow it quickly | Scales beautifully to 10M+ vectors |
| The learning curve will stall us | Intuitive API our junior devs mastered instantly |
..................
Part 2: Building Your RAG Chatbot
After that exhaustive evaluation, we've arrived at our destination: Weaviate! Yes, it was a journey to get testing, comparing, prototyping - but every step was necessary. We didn't just pick a tool; we found a solution that fits our team, our timeline, and our technical requirements perfectly. Now comes the exciting part: Let's roll up our sleeves and build something amazing. I promise the implementation is much smoother than the evaluation was!
Step 1: Setting up your Weaviate Cloud Instance
Before we dive into code, let's get our cluster and credentials ready.
1.1 Create your Weaviate Cloud Account
Go over to Weaviate Cloud Console and sign up. The free tier gives you enough resources to follow along with this tutorial.
1.2 Launch a New Cluster
Click the "Create Cluster" button and configure.
- Cluster Name
- Cloud Provider
- Region: Select the region closest to your users for low latency
- Tier: Start with the Sandbox Tier - It's free and perfect for prototyping without cost concerns. However, it will expire after 14 days as I described above.
1.3 Secure Your Connection
Once your cluster is provisioned (take about a few minutes), you will need to set up:
- API Key: Click "Create API Key" to generate API key, which is like your password - anyone with this key can access your entire vector database.
Step 2: Core Functions: The Engine of Our RAG System
I'll walk you through the key functions that make up our RAG system. For the complete implementation with all imports, helper functions, and configuration, check out the notebook at the end.
2.1 Loading Secrets Keys
user_secret = UserSecretsClient()
weaviate_key = user_secret.get_secret("weaviate_key") # Weaviate URL: "REST Endpoint"
weaviate_url = user_secret.get_secret("weaviate_url") # Weaviate API key: "ADMIN" API key
2.2 Synthetic FAQ Data
I have generated a synthetic FAQ dataset that mirrors real customer service conversations. Here is the structure:
[
{
"question": "How much does shipping cost?",
"answer": "Shipping costs depend on your order total, shipping method, and destination. Standard shipping is free for orders over $50, otherwise it's $4.99. Express shipping costs $9.99, and overnight shipping is $19.99.",
"category": "Shipping",
"subcategory": "costs",
"tags": [
"pricing",
"shipping delivery"
]
}
]
2.3 The Embedding Generator
This function transforms text into 384-dimensional normalized vectors suitable for similarity search in a database. Every FAQ gets converted into a mathematical fingerprint.
def get_embedding(text: str,embedder: SentenceTransformer) ->list:
"Generate normalized embedding for text"
embedding = embedder.encode([text])[0]
vector = np.array(embedding).astype("float32")
return (vector / np.linalg.norm(vector)).tolist()
2.4 Data Validation with Pydantic
class FAQ(BaseModel):
question: str = Field(description="shipping and billing questions")
answer: str=Field(description="shipping and billing answers")
tags: List[str]= Field(description="tags for related customer questions such as payment options, refunding process, billing and shipping information")
2.5 Schema Bridge: Pydantic to Weaviate
This function will convert your Pydantic data model into Weaviate's schema:
def convert_schema_to_weaviate(model:BaseModel) ->List[Property]:
"Convert Pydantic model fields to Weaviate properties with robust type handling."
# Handles: str, int, bool, List[str], nested models
# Returns weaviate-ready property objects
2.6 Build Weaviate Collection
This function creates your entire Weaviate collection, which is completed with vector indexing.
def build_weaviate_collection(
client: weaviate.WeaviateClient,
model:BaseModel,
class_name: str,
class_description: str= "",
vector_index_config = Configure.VectorIndex.hnsw()
) ->List[Property]:
"Turn a Pydantic model into Weaviate Collection"
# 1.Conver model to schema properties
# 2.Configure vector indexing
# 3.Create collection in cloud
2.7 Data Ingestion: From JSON to Vector Search
This is where you convert your raw FAQ data into searchable vectors:
def load_faqs_to_weaviate(
collection_name: str,
embedder: SentenceTransformer,
file_path:str,
client: weaviate.Client
)->None:
"Embed and load FAQs data into existing Weaviate Collection Class using Sentence Transformer"
# Processes Json validation -> embedding generation -> batch import
# Includes error handling, and duplicate prevention
2.8 Let's connect all together
#1. Connect to Weaviate cloud
client = weaviate.connect_to_weaviate_cloud(
cluster_url= weaviate_url,
auth_credentials= Auth.api_key(weaviate_key)
)
#2. Create the collection with our schema
name = "ecommerce_faqs"
desc = "Shipping,Billing, Customer queries"
print(client.collections.exists(name))
build_weaviate_collection(client, FAQ, name, desc)
#3. Initialize our embedding model
embedder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
#4. Load and vectorize our FAQs
load_faqs_to_weaviate(
collection_name="ecommerce_faqs",
embedder= embedder,
file_path= "/kaggle/input/shipping-data-set/shipping_billing_faqs.json",
client=client
)
#This is the output:
Loaded 20 FAQs from /kaggle/input/shipping-data-set/shipping_billing_faqs.json
Generating embeddings and importing FAQs for collection class.....ecommerce_faqs
Successfully Loaded 20 FAQs into collection ecommerce_faqs
We built
- Vector database connection to Weaviate Cloud
- Schema creation with necessary fields
- Generate embeddings
- Data ingestion with vectorization and indexing
- FAQ search engine
Yay, Ready for Search Now
We have a semantic understanding of customer questions. Now, we got accurate and grounded answer. Weaviate searches find 3 most relevant FAQs:
#1. Aks a natural language question
query= "Explain me about my delayed order"
collection = client.collections.get("ecommerce_faqs")
#2. Perform hybrid search (keyword + semantic)
results= collection.query.hybrid(
query=query,
vector=get_embedding(query,embedder),
limit=3,
return_properties=["answer","question"],
return_metadata=["score"],
).objects
print(results)
#3. Display the results
for i,result in enumerate(results,1):
score= getattr(result.metadata,"score","N/A")
question= result.properties.get("question","N/A")
answer=result.properties.get("answer", "No answer available")
print(f"I:{i}, Score{score}, Question={question}, Answer={answer}")
print("****************")
Output
I:1, Score1.0, Question=Why is my order delayed?, Answer=Delays can occur due to weather conditions, carrier issues, customs processing, or incorrect address information. Please check your tracking number for detailed updates or contact our support team for assistance.
****************
I:2, Score0.6934776306152344, Question=How can I track my order?, Answer=You can track your order using the tracking link in your shipping confirmation email, or log into your account and visit the "Order History" section. Tracking updates are provided by the carrier every 24 hours.
****************
I:3, Score0.6007568836212158, Question=Can I change my payment method for an existing order?, Answer=You can change the payment method for an order that hasn't shipped yet. Please contact customer service with your order number and new payment details. Once an order ships, payment method changes are not possible.
****************
The Score Explained
- 0.8+: Excellent match (directly answers your question)
- 0.6-0.8: Good match (related information regarding your question)
- <0.5 : Weak match (it might not be relevant to your question)
This isn't just search - It's understanding. In our old way, the search is based on keyword only, and it didn't work if the question is in different words. Our RAG system beats traditional search. For those interested, here’s the link to my code, so you can follow along or adapt it for your purposes!
What's Next?
Curious about what happens after filtering? In my next piece, I'll dive into how prompt engineering bridges filtered data with natural LLM responses. Stay tuned!!!
