How to Choose the Right Vector Database for a Production-Ready RAG Chatbot

What You'll Learn

Part 1: Evaluating vector databases against real-world filtering requirements
Part 2: Step-by-step implementation for RAG

Part 1: The Vector Database Evaluation Journey

You've built a chatbot powered by a sophisticated large language model (LLM). It is incredibly confident and impresses everyone in the demo. Then a customer asks:

"Can I change my shipping address after placing an order?"

Your chatbot confidently responds:

"No, shipping addresses cannot be changed once an order is placed. You'll need to cancel and reorder."

Oops.

Actually, your policy allows address changes within 2 hours of ordering. One frustrated customer, who just posted a 1-star review about your 'broken chatbot'. Multiply that by the 50 others who will ask the same question. This gets even riskier with pricing.

Imagine your chatbot confidently telling customers:

"Our product A costs $99/month," when you actually changed it to $79/month last quarter.

The Search for a Reliable Solution

When I set out to build a reliable chatbot for our customer service, I evaluated several approaches:

Option 1: Fine-Tuning - Too expensive, needs constant updating

Option 2: Bigger models - Higher costs with still-outdated knowledge

Option 3: RAG (Retrieval-Augmented Generation) - Promising, but with a critical catch.

RAG promised to tackle three core challenges of LLMs:

Hallucination: Plausible-sounding information, but fabricated information
Static Knowledge: The knowledge of an LLM is often frozen in time.
Compute Cost: Extremely high, and requires extensive GPU time, eating your budget

The question isn't whether to enhance your chatbot, but how to do it cost-effectively.

The Vector Database Dilemma

RAG promised to solve our problems, but introduced a new dilemma: "Which vector database should we use with our customer interactions?"

Our chatbot needs to handle queries like:

What is the return policy for electronics under $100?
Can I upgrade shipping for orders over $200?
I want to change my shipping address as I ordered last night.
Show me laptops between $800-$1200 with 16GB RAM?
Find laptops under $500?

Notice the pattern? Real user queries combine numerical filtering with semantic understanding. The goal isn’t just to find similar texts, but to surface relevant information within specific constraints.

When we analysed our conversation logs, we found that queries included price, updated promotion news, and policies. That's when it hit me: Semantic search alone isn't enough for real business logic. This wasn't an edge case; it was core functionality.

Our vector database needed:

Semantic understanding (What are they asking?)
Numerical Filtering (Within what limits?)
Both together, seamlessly

The Vector Database Shortlist

Four options emerged as serious contenders:

ChromaDB — the simplest option, optimized for ease of use
Pinecone — a fully managed solution with no infrastructure overhead
Milvus — the clear choice for large-scale deployments
Weaviate — a flexible platform with multiple hosting options

The question wasn't which was "best" in theory - but which was right for our specific, filter-heavy, production-ready chatbot. Let's have a glance over setup.

Vector Database Setup Comparison

Feature	ChromaDB	Pinecone	Weaviate	Milvus
Installation	pip install chromadb	pip install pinecone-client	pip install weaviate-client	pip install pymilvus
Quick Start	Instant	API key only	Docker or cloud	Docker Compose
Setup Time	5 minutes	5-10 minutes	3-5 minutes for cloud 20 minutes for Docker	45+ minutes
Infrastructure	None needed	None needed	Docker/K8s/Cloud	Docker Compose / K8s
Free Tier Duration	Forever	30 days	14 days (cloud)	Trial (Zilliz)
After Free Tier	Still free	Pay or delete	Pay or self-host	Pay or self-host
Local Development	Excellent	Cloud-only	Docker simple	Complex
Learning Curve	Easy	Easy	Medium	Hard

Query Comparison

We have seen the setup, let's explore the code structure, which could tell usthe true story. Let's see how each database handles the exact queries our customers ask.

Test Query 1: Give me laptops that are below $500

1. Weaviate

result = products.query.near_text(
 query="laptop",
 filters=Filter.by_property("price").less_than(500),
 limit=10
)
result.objects[0].properties['name']

2. Pinecone

result=index.query(vector=get_embedding("laptop", filter={"price":})vector=get_embedding("laptop"),
 filter={"price": {"$lt": 500}},
 top_k=10,
 include_metadata=True
)
result['matches'][0]['metadata']['name']

3. ChromaDB

result = collection.query(
 query_texts=["laptop"],
 where={"price": {"$lt": 500}},
 n_results=10
)
result['metadatas'][0][0]['name']

4. Milvus

result = collection.search(
 data=[get_embedding("laptop")],
 anns_field="embedding",
 param={"metric_type": "L2", "params": {"nprobe": 10}},
 expr="price < 500",
 limit=10,
 output_fields=["name", "price"]
)
result[0][0].entity.get('name')

Let's examine what happens with a bit more complex query: "Show me laptops between $800-$1200 with 16GB RAM."

1. Weaviate

result = products.query.hybrid(
    query="laptop 16GB RAM",
    filters=(
        Filter.by_property("price").greater_or_equal(800) &
        Filter.by_property("price").less_or_equal(1200) &
        Filter.by_property("ram").equal("16GB")
    ),
    limit=10
)

# Access results
for product in result.objects:
    print(f"{product.properties['name']}: ${product.properties['price']}")

2.PineCone

result = index.query(
    vector=get_embedding("laptop 16GB RAM"),
    filter={
        "price": {"$gte": 800, "$lte": 1200},
        "ram": {"$eq": "16GB"}
    },
    top_k=10,
    include_metadata=True
)

# Access results
for match in result['matches']:
    print(f"{match['metadata']['name']}: ${match['metadata']['price']}")

3.ChromaDB

result = collection.query(
    query_texts=["laptop 16GB RAM"],
    where={
        "$and": [
            {"price": {"$gte": 800}},
            {"price": {"$lte": 1200}},
            {"ram": {"$eq": "16GB"}}
        ]
    },
    n_results=10
)

# Access results
for i, meta in enumerate(result['metadatas'][0]):
    print(f"{meta['name']}: ${meta['price']}")

4.Milvus

result = collection.search(
    data=[get_embedding("laptop 16GB RAM")],
    anns_field="embedding",
    param={"metric_type": "L2", "params": {"nprobe": 10}},
    expr='price >= 800 && price <= 1200 && ram == "16GB"',
    limit=10,
    output_fields=["name", "price", "ram"]
)

# Access results
for hits in result:
    for hit in hits:
        print(f"{hit.entity.get('name')}: ${hit.entity.get('price')}")

Filter Syntax in Practice: Where developer experience meets production reality

Database	Type Safety	Readability	Developer Experience	Best for
Weaviate	Compile-time validation (client-side)	Excellent - Reads like natural language queries	Clean method chaining, intuitive API	Complex business logic with multiple filter conditions
Pinecone	Runtime validation (server-side)	Good - JSON/dictionary syntax	Simple dictionary-based filters	With managed infrastructure (zero ops)
ChromaDB	No validation(client-side only)	Okay - With nested dictionary structure	Dictionary-based syntax, minimal learning curve	Prototype and MVPs without complex filtering
Milvus	Runtime only(string parsing)	Complex - String expressions	String-based expressions, error-prone	High-performance, large-scale deployments

The Decision: How We Found Our Perfect Match

After weeks of evaluation, technical deep-dives, and real prototyping, we arrived at a clear winner. But this wasn't about finding the "best" vector database—it was about finding the right partner for our specific journey.

Our Non-Negotiables: The Filter That Filtered Our Options

We built our decision framework around what truly mattered for our team, timeline, and business goals:

1. Week-1 Prototyping: We needed working code in 7 days, not 7 weeks

2. Future Self-Hosting: Cloud today, on-prem tomorrow without API changes

3. Zero-Cost Experimentation: Test ideas without budget approvals

4. Developer-First Experience: No DevOps PhD required

5. Complex Filtering + Hybrid Search: Our chatbot's core competency

6. Clean, Predictable Results: No black-box scoring mysteries

Why It Felt Like Finding "The One"

Our Anxiety	Weaviate's answer
We'll get stuck in DevOps hell	Single Docker container or cloud instance
Our prototype will take months	Working in hours, production-ready in days
Filtering will be hacky and slow	Native, optimized filtering during search
We'll outgrow it quickly	Scales beautifully to 10M+ vectors
The learning curve will stall us	Intuitive API our junior devs mastered instantly

..................

Part 2: Building Your RAG Chatbot

After that exhaustive evaluation, we've arrived at our destination: Weaviate! Yes, it was a journey to get testing, comparing, prototyping - but every step was necessary. We didn't just pick a tool; we found a solution that fits our team, our timeline, and our technical requirements perfectly. Now comes the exciting part: Let's roll up our sleeves and build something amazing. I promise the implementation is much smoother than the evaluation was!

Step 1: Setting up your Weaviate Cloud Instance

Before we dive into code, let's get our cluster and credentials ready.

1.1 Create your Weaviate Cloud Account

Go over to Weaviate Cloud Console and sign up. The free tier gives you enough resources to follow along with this tutorial.

1.2 Launch a New Cluster

Click the "Create Cluster" button and configure.

Cluster Name
Cloud Provider
Region: Select the region closest to your users for low latency
Tier: Start with the Sandbox Tier - It's free and perfect for prototyping without cost concerns. However, it will expire after 14 days as I described above.

1.3 Secure Your Connection

Once your cluster is provisioned (take about a few minutes), you will need to set up:

API Key: Click "Create API Key" to generate API key, which is like your password - anyone with this key can access your entire vector database.

Step 2: Core Functions: The Engine of Our RAG System

I'll walk you through the key functions that make up our RAG system. For the complete implementation with all imports, helper functions, and configuration, check out the notebook at the end.

2.1 Loading Secrets Keys

user_secret = UserSecretsClient()
weaviate_key = user_secret.get_secret("weaviate_key") # Weaviate URL: "REST Endpoint"
weaviate_url = user_secret.get_secret("weaviate_url") # Weaviate API key: "ADMIN" API key

2.2 Synthetic FAQ Data

I have generated a synthetic FAQ dataset that mirrors real customer service conversations. Here is the structure:

[
  {
      "question": "How much does shipping cost?",
      "answer": "Shipping costs depend on your order total, shipping method, and destination. Standard shipping is free for orders over $50, otherwise it's $4.99. Express shipping costs $9.99, and overnight shipping is $19.99.",
      "category": "Shipping",
      "subcategory": "costs",
      "tags": [
        "pricing",
        "shipping delivery"
      ]
    }
  ]

2.3 The Embedding Generator

This function transforms text into 384-dimensional normalized vectors suitable for similarity search in a database. Every FAQ gets converted into a mathematical fingerprint.

def get_embedding(text: str,embedder: SentenceTransformer) ->list:
    "Generate normalized embedding for text"
    embedding = embedder.encode([text])[0]
    vector = np.array(embedding).astype("float32")
    return (vector / np.linalg.norm(vector)).tolist()

2.4 Data Validation with Pydantic

class FAQ(BaseModel):
    question: str = Field(description="shipping and billing questions")
    answer: str=Field(description="shipping and billing answers")
    tags: List[str]= Field(description="tags for related customer questions such as payment options, refunding process, billing and shipping information")

2.5 Schema Bridge: Pydantic to Weaviate

This function will convert your Pydantic data model into Weaviate's schema:

def convert_schema_to_weaviate(model:BaseModel) ->List[Property]:
  "Convert Pydantic model fields to Weaviate properties with robust type handling."
  # Handles: str, int, bool, List[str], nested models
  # Returns weaviate-ready property objects

2.6 Build Weaviate Collection

This function creates your entire Weaviate collection, which is completed with vector indexing.

def build_weaviate_collection(
      client: weaviate.WeaviateClient,
      model:BaseModel,
      class_name: str,
      class_description: str= "",
      vector_index_config = Configure.VectorIndex.hnsw()
           ) ->List[Property]:
      "Turn a Pydantic model into Weaviate Collection"
          # 1.Conver model to schema properties
          # 2.Configure vector indexing
          # 3.Create collection in cloud

2.7 Data Ingestion: From JSON to Vector Search

This is where you convert your raw FAQ data into searchable vectors:

def load_faqs_to_weaviate(
    collection_name: str,
    embedder: SentenceTransformer,
    file_path:str,
    client: weaviate.Client
)->None:
    "Embed and load FAQs data into existing Weaviate Collection Class using Sentence Transformer"
    # Processes Json validation -> embedding generation -> batch import
    # Includes error handling, and duplicate prevention

2.8 Let's connect all together

#1. Connect to Weaviate cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url= weaviate_url,
    auth_credentials= Auth.api_key(weaviate_key)
)

#2. Create the collection with our schema
name = "ecommerce_faqs"
desc = "Shipping,Billing, Customer queries"
print(client.collections.exists(name))
build_weaviate_collection(client, FAQ, name, desc)

#3. Initialize our embedding model
embedder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

#4. Load and vectorize our FAQs
load_faqs_to_weaviate(
    collection_name="ecommerce_faqs",
    embedder= embedder,
    file_path= "/kaggle/input/shipping-data-set/shipping_billing_faqs.json",
    client=client
)

#This is the output:
Loaded 20 FAQs from /kaggle/input/shipping-data-set/shipping_billing_faqs.json
Generating embeddings and importing FAQs for collection class.....ecommerce_faqs
Successfully Loaded 20 FAQs into collection ecommerce_faqs

We built

Vector database connection to Weaviate Cloud
Schema creation with necessary fields
Generate embeddings
Data ingestion with vectorization and indexing
FAQ search engine

Yay, Ready for Search Now

We have a semantic understanding of customer questions. Now, we got accurate and grounded answer. Weaviate searches find 3 most relevant FAQs:

#1. Aks a natural language question
query= "Explain me about my delayed order"
collection = client.collections.get("ecommerce_faqs")

#2. Perform hybrid search (keyword + semantic)
results= collection.query.hybrid(
    query=query,
    vector=get_embedding(query,embedder),
    limit=3,
    return_properties=["answer","question"],
    return_metadata=["score"],
).objects

print(results)

#3. Display the results
for i,result in enumerate(results,1):
    score= getattr(result.metadata,"score","N/A")
    question= result.properties.get("question","N/A")
    answer=result.properties.get("answer", "No answer available")
    print(f"I:{i}, Score{score}, Question={question}, Answer={answer}")
    print("****************")

Output

I:1, Score1.0, Question=Why is my order delayed?, Answer=Delays can occur due to weather conditions, carrier issues, customs processing, or incorrect address information. Please check your tracking number for detailed updates or contact our support team for assistance.
****************
I:2, Score0.6934776306152344, Question=How can I track my order?, Answer=You can track your order using the tracking link in your shipping confirmation email, or log into your account and visit the "Order History" section. Tracking updates are provided by the carrier every 24 hours.
****************
I:3, Score0.6007568836212158, Question=Can I change my payment method for an existing order?, Answer=You can change the payment method for an order that hasn't shipped yet. Please contact customer service with your order number and new payment details. Once an order ships, payment method changes are not possible.
****************

The Score Explained

0.8+: Excellent match (directly answers your question)
0.6-0.8: Good match (related information regarding your question)
<0.5 : Weak match (it might not be relevant to your question)

This isn't just search - It's understanding. In our old way, the search is based on keyword only, and it didn't work if the question is in different words. Our RAG system beats traditional search. For those interested, here’s the link to my code, so you can follow along or adapt it for your purposes!

What's Next?

Curious about what happens after filtering? In my next piece, I'll dive into how prompt engineering bridges filtered data with natural LLM responses. Stay tuned!!!