Hack Your Own RAG Stack in Under an Hour

If you're building a chatbot, search engine, or any AI application that needs to "know stuff," you've probably bumped into a hard truth:

Large Language Models (LLMs) can't access your private or domain-specific data unless you feed it to them.

Whether it’s product documentation, internal policies, or real-time records, large language models can’t access external knowledge unless you explicitly feed it to them.

Enter RAG (Retrieval-Augmented Generation).

RAG combines the creative power of an LLM with the factual accuracy of your own data. At its core, it relies on semantic search finding the most relevant pieces of text based on meaning, not just keywords.

Instead of asking an LLM to hallucinate answers, RAG pipelines first retrieve relevant content from your data sources, then pass it into the model. The result? More accurate, grounded, and useful responses.

But how do you build the retrieval part?

In this tutorial, we’ll build a lightweight, fast, and cost-free semantic search API using:

PostgreSQL + pgvector to store and query embeddings
Transformers.js to run a MiniLM model in JavaScript, no cloud required
Fastify for a blazing-fast web server

Note: This guide assumes you’re comfortable with basic JavaScript/Node.js and have PostgreSQL installed. Some familiarity with REST APIs and vector embeddings will help, but we’ll keep things practical and code-focused throughout

Let's get started.

1. Setting Up PostgreSQL with pgvector

Install the vector extension in your PostgreSQL database:

CREATE EXTENSION IF NOT EXISTS vector;

Then, create a simple table to store documents and their embeddings:

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT,
  embedding vector(384)  -- Dimensions of MiniLM-L6-v2
);

2. Generating Embeddings with Transformers.js

Install the dependencies:

npm install pg @xenova/transformers

Use this script to embed your content and store it in Postgres:

import { Client } from 'pg';
import { pipeline } from '@xenova/transformers';

const db = new Client({ connectionString: 'postgres://localhost/yourdb' });
let embedder = null;

async function generateEmbedding(text) {
  if (!embedder) {
    embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
  }
  const output = await embedder(text, { pooling: 'mean', normalize: true });
  return Array.from(output.data);
}

const docs = [
  { title: 'Cats', content: 'Cats are independent and curious animals.' },
  { title: 'Space', content: 'The universe is vast and mostly unexplored.' },
  { title: 'Bananas', content: 'Bananas are a yellow tropical fruit.' },
];

await db.connect();

for (const doc of docs) {
  const vec = await generateEmbedding(doc.content);
  const pgVector = `[${vec.join(',')}]`;
  await db.query(
    'INSERT INTO documents (title, content, embedding) VALUES ($1, $2, $3::vector)',
    [doc.title, doc.content, pgVector]
  );
  console.log(`Inserted: ${doc.title}`);
}

await db.end();

3. Querying with pgvector

To find the most relevant content to a user's query, embed the query and compare it to your document vectors using cosine distance:

SELECT title, content, embedding <#> $1::vector AS score
FROM documents
ORDER BY score ASC
LIMIT 3;

The <#> operator returns the cosine distance. Lower means more similar.

4. Building the Fastify Search API

npm install fastify @fastify/cors
import Fastify from 'fastify';
import cors from '@fastify/cors';
import { Pool } from 'pg';
import { pipeline } from '@xenova/transformers';

const fastify = Fastify();
await fastify.register(cors, { origin: '*' });
const pool = new Pool({ connectionString: 'postgres://localhost/yourdb' });

let embedder = null;
async function generateEmbedding(text) {
  if (!embedder) {
    embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
  }
  const output = await embedder(text, { pooling: 'mean', normalize: true });
  return `[${Array.from(output.data).join(',')}]`;
}

fastify.post('/search', async (req, res) => {
  const { query } = req.body;
  if (!query) return res.status(400).send({ error: 'Query is required' });

  const vector = await generateEmbedding(query);
  const { rows } = await pool.query(
    `SELECT title, content, embedding <#> $1::vector AS score
     FROM documents
     ORDER BY score ASC
     LIMIT 3`,
    [vector]
  );

  res.send(rows);
});

fastify.listen({ port: 3000 }, () => {
  console.log('API ready at http://localhost:3000');
});

Test it with:

curl -X POST http://localhost:3000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "Tell me about fruit"}'

5. Use Cases and Next Steps

This stack is perfect for:

RAG pipelines (feed results into an LLM)
Internal knowledge search
Chatbot memory lookup
Smart filtering with natural language

What to add next:

Metadata filtering in queries
Chunking for longer docs
Hybrid search (text + vector)
Integration with OpenAI or Mistral

Conclusion

Vector search is no longer just for ML engineers. With pgvector, Tansformer.js, and Fastify, you can build your own semantic search engine in under an hour, without vendor lock-in. This is not production ready but can help as a baseline for your prod apps.

Happy hacking!