How to Choose the Right Vector Database for a Production-Ready RAG Chatbot

Written by nee2112 | Published 2026/01/12
Tech Story Tags: vector-database | rag-systems | vector-embedding | vector-database-for-ai | rag-chatbot-architecture | weaviate-rag-tutorial | vector-database-comparison | hackernoon-top-story

TLDRThis article is part one of a two-part series on vector databases. The other part will focus on RAG (Retrieval-Augmented Generation) database technology. The first part of the series looks at how to use RAG to build a reliable chatbot for customer service. The second part will look at the cost-effectiveness of using a vector database.via the TL;DR App

What You'll Learn

  • Part 1: Evaluating vector databases against real-world filtering requirements
  • Part 2: Step-by-step implementation for RAG

Part 1: The Vector Database Evaluation Journey

You've built a chatbot powered by a sophisticated large language model (LLM). It is incredibly confident and impresses everyone in the demo. Then a customer asks:

"Can I change my shipping address after placing an order?"


Your chatbot confidently responds:

"No, shipping addresses cannot be changed once an order is placed. You'll need to cancel and reorder."


Oops.


Actually, your policy allows address changes within 2 hours of ordering. One frustrated customer, who just posted a 1-star review about your 'broken chatbot'. Multiply that by the 50 others who will ask the same question. This gets even riskier with pricing.


Imagine your chatbot confidently telling customers:

"Our product A costs $99/month," when you actually changed it to $79/month last quarter.

The Search for a Reliable Solution

When I set out to build a reliable chatbot for our customer service, I evaluated several approaches:

Option 1: Fine-Tuning - Too expensive, needs constant updating

Option 2: Bigger models - Higher costs with still-outdated knowledge

Option 3: RAG (Retrieval-Augmented Generation) - Promising, but with a critical catch.


RAG promised to tackle three core challenges of LLMs:

  • Hallucination: Plausible-sounding information, but fabricated information
  • Static Knowledge: The knowledge of an LLM is often frozen in time.
  • Compute Cost: Extremely high, and requires extensive GPU time, eating your budget


The question isn't whether to enhance your chatbot, but how to do it cost-effectively.

The Vector Database Dilemma

RAG promised to solve our problems, but introduced a new dilemma: "Which vector database should we use with our customer interactions?"


Our chatbot needs to handle queries like:

  • What is the return policy for electronics under $100?
  • Can I upgrade shipping for orders over $200?
  • I want to change my shipping address as I ordered last night.
  • Show me laptops between $800-$1200 with 16GB RAM?
  • Find laptops under $500?


Notice the pattern? Real user queries combine numerical filtering with semantic understanding. The goal isn’t just to find similar texts, but to surface relevant information within specific constraints.


When we analysed our conversation logs, we found that queries included price, updated promotion news, and policies. That's when it hit me: Semantic search alone isn't enough for real business logic. This wasn't an edge case; it was core functionality.


Our vector database needed:

  • Semantic understanding (What are they asking?)
  • Numerical Filtering (Within what limits?)
  • Both together, seamlessly

The Vector Database Shortlist

Four options emerged as serious contenders:

  • ChromaDB — the simplest option, optimized for ease of use
  • Pinecone — a fully managed solution with no infrastructure overhead
  • Milvus — the clear choice for large-scale deployments
  • Weaviate — a flexible platform with multiple hosting options


The question wasn't which was "best" in theory - but which was right for our specific, filter-heavy, production-ready chatbot. Let's have a glance over setup.

Vector Database Setup Comparison

FeatureChromaDBPineconeWeaviateMilvus
Installationpip install chromadbpip install pinecone-clientpip install weaviate-clientpip install pymilvus
Quick StartInstantAPI key onlyDocker or cloudDocker Compose
Setup Time5 minutes5-10 minutes3-5 minutes for cloud 20 minutes for Docker45+ minutes
InfrastructureNone neededNone neededDocker/K8s/CloudDocker Compose / K8s
Free Tier DurationForever30 days14 days (cloud)Trial (Zilliz)
After Free TierStill freePay or deletePay or self-hostPay or self-host
Local DevelopmentExcellentCloud-onlyDocker simpleComplex
Learning CurveEasyEasy MediumHard

Query Comparison

We have seen the setup, let's explore the code structure, which could tell usthe true story. Let's see how each database handles the exact queries our customers ask.


Test Query 1: Give me laptops that are below $500

1. Weaviate

result = products.query.near_text(
 query="laptop",
 filters=Filter.by_property("price").less_than(500),
 limit=10
)
result.objects[0].properties['name']


2. Pinecone

result=index.query(vector=get_embedding("laptop", filter={"price":})vector=get_embedding("laptop"),
 filter={"price": {"$lt": 500}},
 top_k=10,
 include_metadata=True
)
result['matches'][0]['metadata']['name']


3. ChromaDB

result = collection.query(
 query_texts=["laptop"],
 where={"price": {"$lt": 500}},
 n_results=10
)
result['metadatas'][0][0]['name']


4. Milvus

result = collection.search(
 data=[get_embedding("laptop")],
 anns_field="embedding",
 param={"metric_type": "L2", "params": {"nprobe": 10}},
 expr="price < 500",
 limit=10,
 output_fields=["name", "price"]
)
result[0][0].entity.get('name')


Let's examine what happens with a bit more complex query: "Show me laptops between $800-$1200 with 16GB RAM."

1. Weaviate

result = products.query.hybrid(
    query="laptop 16GB RAM",
    filters=(
        Filter.by_property("price").greater_or_equal(800) &
        Filter.by_property("price").less_or_equal(1200) &
        Filter.by_property("ram").equal("16GB")
    ),
    limit=10
)

# Access results
for product in result.objects:
    print(f"{product.properties['name']}: ${product.properties['price']}")


2.PineCone

result = index.query(
    vector=get_embedding("laptop 16GB RAM"),
    filter={
        "price": {"$gte": 800, "$lte": 1200},
        "ram": {"$eq": "16GB"}
    },
    top_k=10,
    include_metadata=True
)

# Access results
for match in result['matches']:
    print(f"{match['metadata']['name']}: ${match['metadata']['price']}")


3.ChromaDB

result = collection.query(
    query_texts=["laptop 16GB RAM"],
    where={
        "$and": [
            {"price": {"$gte": 800}},
            {"price": {"$lte": 1200}},
            {"ram": {"$eq": "16GB"}}
        ]
    },
    n_results=10
)

# Access results
for i, meta in enumerate(result['metadatas'][0]):
    print(f"{meta['name']}: ${meta['price']}")


4.Milvus

result = collection.search(
    data=[get_embedding("laptop 16GB RAM")],
    anns_field="embedding",
    param={"metric_type": "L2", "params": {"nprobe": 10}},
    expr='price >= 800 && price <= 1200 && ram == "16GB"',
    limit=10,
    output_fields=["name", "price", "ram"]
)

# Access results
for hits in result:
    for hit in hits:
        print(f"{hit.entity.get('name')}: ${hit.entity.get('price')}")


Filter Syntax in Practice: Where developer experience meets production reality

DatabaseType SafetyReadability Developer ExperienceBest for
WeaviateCompile-time validation (client-side)Excellent - Reads like natural language queriesClean method chaining, intuitive APIComplex business logic with multiple filter conditions
PineconeRuntime validation (server-side)Good - JSON/dictionary syntaxSimple dictionary-based filtersWith managed infrastructure (zero ops)
ChromaDBNo validation(client-side only)Okay - With nested dictionary structureDictionary-based syntax, minimal learning curvePrototype and MVPs without complex filtering
MilvusRuntime only(string parsing)Complex - String expressions String-based expressions, error-prone High-performance, large-scale deployments

The Decision: How We Found Our Perfect Match

After weeks of evaluation, technical deep-dives, and real prototyping, we arrived at a clear winner. But this wasn't about finding the "best" vector database—it was about finding the right partner for our specific journey.

Our Non-Negotiables: The Filter That Filtered Our Options

We built our decision framework around what truly mattered for our team, timeline, and business goals:

1. Week-1 Prototyping: We needed working code in 7 days, not 7 weeks

2. Future Self-Hosting: Cloud today, on-prem tomorrow without API changes

3. Zero-Cost Experimentation: Test ideas without budget approvals

4. Developer-First Experience: No DevOps PhD required

5. Complex Filtering + Hybrid Search: Our chatbot's core competency

6. Clean, Predictable Results: No black-box scoring mysteries

Why It Felt Like Finding "The One"

Our AnxietyWeaviate's answer
We'll get stuck in DevOps hellSingle Docker container or cloud instance
Our prototype will take monthsWorking in hours, production-ready in days
Filtering will be hacky and slowNative, optimized filtering during search
We'll outgrow it quickly Scales beautifully to 10M+ vectors
The learning curve will stall usIntuitive API our junior devs mastered instantly

..................

Part 2: Building Your RAG Chatbot

After that exhaustive evaluation, we've arrived at our destination: Weaviate! Yes, it was a journey to get testing, comparing, prototyping - but every step was necessary. We didn't just pick a tool; we found a solution that fits our team, our timeline, and our technical requirements perfectly. Now comes the exciting part: Let's roll up our sleeves and build something amazing. I promise the implementation is much smoother than the evaluation was!

Step 1: Setting up your Weaviate Cloud Instance

Before we dive into code, let's get our cluster and credentials ready.

1.1 Create your Weaviate Cloud Account

Go over to Weaviate Cloud Console and sign up. The free tier gives you enough resources to follow along with this tutorial.

1.2 Launch a New Cluster

Click the "Create Cluster" button and configure.

  • Cluster Name
  • Cloud Provider
  • Region: Select the region closest to your users for low latency
  • Tier: Start with the Sandbox Tier - It's free and perfect for prototyping without cost concerns. However, it will expire after 14 days as I described above.

1.3 Secure Your Connection

Once your cluster is provisioned (take about a few minutes), you will need to set up:

  • API Key: Click "Create API Key" to generate API key, which is like your password - anyone with this key can access your entire vector database.

Step 2: Core Functions: The Engine of Our RAG System

I'll walk you through the key functions that make up our RAG system. For the complete implementation with all imports, helper functions, and configuration, check out the notebook at the end.

2.1 Loading Secrets Keys

user_secret = UserSecretsClient()
weaviate_key = user_secret.get_secret("weaviate_key") # Weaviate URL: "REST Endpoint"
weaviate_url = user_secret.get_secret("weaviate_url") # Weaviate API key: "ADMIN" API key

2.2 Synthetic FAQ Data

I have generated a synthetic FAQ dataset that mirrors real customer service conversations. Here is the structure:

[
  {
      "question": "How much does shipping cost?",
      "answer": "Shipping costs depend on your order total, shipping method, and destination. Standard shipping is free for orders over $50, otherwise it's $4.99. Express shipping costs $9.99, and overnight shipping is $19.99.",
      "category": "Shipping",
      "subcategory": "costs",
      "tags": [
        "pricing",
        "shipping delivery"
      ]
    }
  ]

2.3 The Embedding Generator

This function transforms text into 384-dimensional normalized vectors suitable for similarity search in a database. Every FAQ gets converted into a mathematical fingerprint.

def get_embedding(text: str,embedder: SentenceTransformer) ->list:
    "Generate normalized embedding for text"
    embedding = embedder.encode([text])[0]
    vector = np.array(embedding).astype("float32")
    return (vector / np.linalg.norm(vector)).tolist()


2.4 Data Validation with Pydantic

class FAQ(BaseModel):
    question: str = Field(description="shipping and billing questions")
    answer: str=Field(description="shipping and billing answers")
    tags: List[str]= Field(description="tags for related customer questions such as payment options, refunding process, billing and shipping information")

2.5 Schema Bridge: Pydantic to Weaviate

This function will convert your Pydantic data model into Weaviate's schema:

def convert_schema_to_weaviate(model:BaseModel) ->List[Property]:
  "Convert Pydantic model fields to Weaviate properties with robust type handling."
  # Handles: str, int, bool, List[str], nested models
  # Returns weaviate-ready property objects

2.6 Build Weaviate Collection

This function creates your entire Weaviate collection, which is completed with vector indexing.

def build_weaviate_collection(
      client: weaviate.WeaviateClient,
      model:BaseModel,
      class_name: str,
      class_description: str= "",
      vector_index_config = Configure.VectorIndex.hnsw()
           ) ->List[Property]:
      "Turn a Pydantic model into Weaviate Collection"
          # 1.Conver model to schema properties
          # 2.Configure vector indexing
          # 3.Create collection in cloud

This is where you convert your raw FAQ data into searchable vectors:

def load_faqs_to_weaviate(
    collection_name: str,
    embedder: SentenceTransformer,
    file_path:str,
    client: weaviate.Client
)->None:
    "Embed and load FAQs data into existing Weaviate Collection Class using Sentence Transformer"
    # Processes Json validation -> embedding generation -> batch import
    # Includes error handling, and duplicate prevention

2.8 Let's connect all together

#1. Connect to Weaviate cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url= weaviate_url,
    auth_credentials= Auth.api_key(weaviate_key)
)

#2. Create the collection with our schema
name = "ecommerce_faqs"
desc = "Shipping,Billing, Customer queries"
print(client.collections.exists(name))
build_weaviate_collection(client, FAQ, name, desc)

#3. Initialize our embedding model
embedder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

#4. Load and vectorize our FAQs
load_faqs_to_weaviate(
    collection_name="ecommerce_faqs",
    embedder= embedder,
    file_path= "/kaggle/input/shipping-data-set/shipping_billing_faqs.json",
    client=client
)


#This is the output:
Loaded 20 FAQs from /kaggle/input/shipping-data-set/shipping_billing_faqs.json
Generating embeddings and importing FAQs for collection class.....ecommerce_faqs
Successfully Loaded 20 FAQs into collection ecommerce_faqs

We built

  • Vector database connection to Weaviate Cloud
  • Schema creation with necessary fields
  • Generate embeddings
  • Data ingestion with vectorization and indexing
  • FAQ search engine

Yay, Ready for Search Now

We have a semantic understanding of customer questions. Now, we got accurate and grounded answer. Weaviate searches find 3 most relevant FAQs:

#1. Aks a natural language question
query= "Explain me about my delayed order"
collection = client.collections.get("ecommerce_faqs")

#2. Perform hybrid search (keyword + semantic)
results= collection.query.hybrid(
    query=query,
    vector=get_embedding(query,embedder),
    limit=3,
    return_properties=["answer","question"],
    return_metadata=["score"],
).objects

print(results)

#3. Display the results
for i,result in enumerate(results,1):
    score= getattr(result.metadata,"score","N/A")
    question= result.properties.get("question","N/A")
    answer=result.properties.get("answer", "No answer available")
    print(f"I:{i}, Score{score}, Question={question}, Answer={answer}")
    print("****************")


Output

I:1, Score1.0, Question=Why is my order delayed?, Answer=Delays can occur due to weather conditions, carrier issues, customs processing, or incorrect address information. Please check your tracking number for detailed updates or contact our support team for assistance.
****************
I:2, Score0.6934776306152344, Question=How can I track my order?, Answer=You can track your order using the tracking link in your shipping confirmation email, or log into your account and visit the "Order History" section. Tracking updates are provided by the carrier every 24 hours.
****************
I:3, Score0.6007568836212158, Question=Can I change my payment method for an existing order?, Answer=You can change the payment method for an order that hasn't shipped yet. Please contact customer service with your order number and new payment details. Once an order ships, payment method changes are not possible.
****************

The Score Explained

  • 0.8+: Excellent match (directly answers your question)
  • 0.6-0.8: Good match (related information regarding your question)
  • <0.5 : Weak match (it might not be relevant to your question)


This isn't just search - It's understanding. In our old way, the search is based on keyword only, and it didn't work if the question is in different words. Our RAG system beats traditional search. For those interested, here’s the link to my code, so you can follow along or adapt it for your purposes!

What's Next?

Curious about what happens after filtering? In my next piece, I'll dive into how prompt engineering bridges filtered data with natural LLM responses. Stay tuned!!!




Written by nee2112 | Specializing in AI, Analysis, and 10+ years of experience in software engineering
Published by HackerNoon on 2026/01/12