If you're building a chatbot, search engine, or any AI application that needs to "know stuff," you've probably bumped into a hard truth:
Large Language Models (LLMs) can't access your private or domain-specific data unless you feed it to them.
Whether it’s product documentation, internal policies, or real-time records, large language models can’t access external knowledge unless you explicitly feed it to them.
Enter RAG (Retrieval-Augmented Generation).
RAG combines the creative power of an LLM with the factual accuracy of your own data. At its core, it relies on semantic search finding the most relevant pieces of text based on meaning, not just keywords.
Instead of asking an LLM to hallucinate answers, RAG pipelines first retrieve relevant content from your data sources, then pass it into the model. The result? More accurate, grounded, and useful responses.
But how do you build the retrieval part?
In this tutorial, we’ll build a lightweight, fast, and cost-free semantic search API using:
- PostgreSQL + pgvector to store and query embeddings
- Transformers.js to run a MiniLM model in JavaScript, no cloud required
- Fastify for a blazing-fast web server
Note: This guide assumes you’re comfortable with basic JavaScript/Node.js and have PostgreSQL installed. Some familiarity with REST APIs and vector embeddings will help, but we’ll keep things practical and code-focused throughout
Let's get started.
1. Setting Up PostgreSQL with pgvector
Install the vector extension in your PostgreSQL database:
CREATE EXTENSION IF NOT EXISTS vector;
Then, create a simple table to store documents and their embeddings:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding vector(384) -- Dimensions of MiniLM-L6-v2
);
2. Generating Embeddings with Transformers.js
Install the dependencies:
npm install pg @xenova/transformers
Use this script to embed your content and store it in Postgres:
import { Client } from 'pg';
import { pipeline } from '@xenova/transformers';
const db = new Client({ connectionString: 'postgres://localhost/yourdb' });
let embedder = null;
async function generateEmbedding(text) {
if (!embedder) {
embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
}
const output = await embedder(text, { pooling: 'mean', normalize: true });
return Array.from(output.data);
}
const docs = [
{ title: 'Cats', content: 'Cats are independent and curious animals.' },
{ title: 'Space', content: 'The universe is vast and mostly unexplored.' },
{ title: 'Bananas', content: 'Bananas are a yellow tropical fruit.' },
];
await db.connect();
for (const doc of docs) {
const vec = await generateEmbedding(doc.content);
const pgVector = `[${vec.join(',')}]`;
await db.query(
'INSERT INTO documents (title, content, embedding) VALUES ($1, $2, $3::vector)',
[doc.title, doc.content, pgVector]
);
console.log(`Inserted: ${doc.title}`);
}
await db.end();
3. Querying with pgvector
To find the most relevant content to a user's query, embed the query and compare it to your document vectors using cosine distance:
SELECT title, content, embedding <#> $1::vector AS score
FROM documents
ORDER BY score ASC
LIMIT 3;
The <#> operator returns the cosine distance. Lower means more similar.
4. Building the Fastify Search API
npm install fastify @fastify/cors
import Fastify from 'fastify';
import cors from '@fastify/cors';
import { Pool } from 'pg';
import { pipeline } from '@xenova/transformers';
const fastify = Fastify();
await fastify.register(cors, { origin: '*' });
const pool = new Pool({ connectionString: 'postgres://localhost/yourdb' });
let embedder = null;
async function generateEmbedding(text) {
if (!embedder) {
embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2');
}
const output = await embedder(text, { pooling: 'mean', normalize: true });
return `[${Array.from(output.data).join(',')}]`;
}
fastify.post('/search', async (req, res) => {
const { query } = req.body;
if (!query) return res.status(400).send({ error: 'Query is required' });
const vector = await generateEmbedding(query);
const { rows } = await pool.query(
`SELECT title, content, embedding <#> $1::vector AS score
FROM documents
ORDER BY score ASC
LIMIT 3`,
[vector]
);
res.send(rows);
});
fastify.listen({ port: 3000 }, () => {
console.log('API ready at http://localhost:3000');
});
Test it with:
curl -X POST http://localhost:3000/search \
-H "Content-Type: application/json" \
-d '{"query": "Tell me about fruit"}'
5. Use Cases and Next Steps
This stack is perfect for:
- RAG pipelines (feed results into an LLM)
- Internal knowledge search
- Chatbot memory lookup
- Smart filtering with natural language
What to add next:
- Metadata filtering in queries
- Chunking for longer docs
- Hybrid search (text + vector)
- Integration with OpenAI or Mistral
Conclusion
Vector search is no longer just for ML engineers. With pgvector, Tansformer.js, and Fastify, you can build your own semantic search engine in under an hour, without vendor lock-in. This is not production ready but can help as a baseline for your prod apps.
Happy hacking!
