If you're building a chatbot, search engine, or any AI application that needs to "know stuff," you've probably bumped into a hard truth: Large Language Models (LLMs) can't access your private or domain-specific data unless you feed it to them. Whether it’s product documentation, internal policies, or real-time records, large language models can’t access external knowledge unless you explicitly feed it to them. you explicitly feed it to them Enter RAG (Retrieval-Augmented Generation). RAG RAG combines the creative power of an LLM with the factual accuracy of your own data. At its core, it relies on semantic search finding the most relevant pieces of text based on meaning, not just keywords. semantic search meaning Instead of asking an LLM to hallucinate answers, RAG pipelines first retrieve relevant content from your data sources, then pass it into the model. The result? More accurate, grounded, and useful responses. But how do you build the retrieval part? In this tutorial, we’ll build a lightweight, fast, and cost-free semantic search API using: PostgreSQL + pgvector to store and query embeddingsTransformers.js to run a MiniLM model in JavaScript, no cloud requiredFastify for a blazing-fast web server PostgreSQL + pgvector to store and query embeddings PostgreSQL + pgvector Transformers.js to run a MiniLM model in JavaScript, no cloud required Transformers.js Fastify for a blazing-fast web server Fastify Note: This guide assumes you’re comfortable with basic JavaScript/Node.js and have PostgreSQL installed. Some familiarity with REST APIs and vector embeddings will help, but we’ll keep things practical and code-focused throughout This guide assumes you’re comfortable with basic JavaScript/Node.js and have PostgreSQL installed. Some familiarity with REST APIs and vector embeddings will help, but we’ll keep things practical and code-focused throughout Let's get started. 1. Setting Up PostgreSQL with pgvector Install the vector extension in your PostgreSQL database: CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS vector; Then, create a simple table to store documents and their embeddings: CREATE TABLE documents ( id SERIAL PRIMARY KEY, title TEXT, content TEXT, embedding vector(384) -- Dimensions of MiniLM-L6-v2 ); CREATE TABLE documents ( id SERIAL PRIMARY KEY, title TEXT, content TEXT, embedding vector(384) -- Dimensions of MiniLM-L6-v2 ); 2. Generating Embeddings with Transformers.js Install the dependencies: npm install pg @xenova/transformers npm install pg @xenova/transformers Use this script to embed your content and store it in Postgres: import { Client } from 'pg'; import { pipeline } from '@xenova/transformers'; const db = new Client({ connectionString: 'postgres://localhost/yourdb' }); let embedder = null; async function generateEmbedding(text) { if (!embedder) { embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2'); } const output = await embedder(text, { pooling: 'mean', normalize: true }); return Array.from(output.data); } const docs = [ { title: 'Cats', content: 'Cats are independent and curious animals.' }, { title: 'Space', content: 'The universe is vast and mostly unexplored.' }, { title: 'Bananas', content: 'Bananas are a yellow tropical fruit.' }, ]; await db.connect(); for (const doc of docs) { const vec = await generateEmbedding(doc.content); const pgVector = `[${vec.join(',')}]`; await db.query( 'INSERT INTO documents (title, content, embedding) VALUES ($1, $2, $3::vector)', [doc.title, doc.content, pgVector] ); console.log(`Inserted: ${doc.title}`); } await db.end(); import { Client } from 'pg'; import { pipeline } from '@xenova/transformers'; const db = new Client({ connectionString: 'postgres://localhost/yourdb' }); let embedder = null; async function generateEmbedding(text) { if (!embedder) { embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2'); } const output = await embedder(text, { pooling: 'mean', normalize: true }); return Array.from(output.data); } const docs = [ { title: 'Cats', content: 'Cats are independent and curious animals.' }, { title: 'Space', content: 'The universe is vast and mostly unexplored.' }, { title: 'Bananas', content: 'Bananas are a yellow tropical fruit.' }, ]; await db.connect(); for (const doc of docs) { const vec = await generateEmbedding(doc.content); const pgVector = `[${vec.join(',')}]`; await db.query( 'INSERT INTO documents (title, content, embedding) VALUES ($1, $2, $3::vector)', [doc.title, doc.content, pgVector] ); console.log(`Inserted: ${doc.title}`); } await db.end(); 3. Querying with pgvector To find the most relevant content to a user's query, embed the query and compare it to your document vectors using cosine distance: SELECT title, content, embedding <#> $1::vector AS score FROM documents ORDER BY score ASC LIMIT 3; SELECT title, content, embedding <#> $1::vector AS score FROM documents ORDER BY score ASC LIMIT 3; The <#> operator returns the cosine distance. Lower means more similar. 4. Building the Fastify Search API npm install fastify @fastify/cors import Fastify from 'fastify'; import cors from '@fastify/cors'; import { Pool } from 'pg'; import { pipeline } from '@xenova/transformers'; const fastify = Fastify(); await fastify.register(cors, { origin: '*' }); const pool = new Pool({ connectionString: 'postgres://localhost/yourdb' }); let embedder = null; async function generateEmbedding(text) { if (!embedder) { embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2'); } const output = await embedder(text, { pooling: 'mean', normalize: true }); return `[${Array.from(output.data).join(',')}]`; } fastify.post('/search', async (req, res) => { const { query } = req.body; if (!query) return res.status(400).send({ error: 'Query is required' }); const vector = await generateEmbedding(query); const { rows } = await pool.query( `SELECT title, content, embedding <#> $1::vector AS score FROM documents ORDER BY score ASC LIMIT 3`, [vector] ); res.send(rows); }); fastify.listen({ port: 3000 }, () => { console.log('API ready at http://localhost:3000'); }); npm install fastify @fastify/cors import Fastify from 'fastify'; import cors from '@fastify/cors'; import { Pool } from 'pg'; import { pipeline } from '@xenova/transformers'; const fastify = Fastify(); await fastify.register(cors, { origin: '*' }); const pool = new Pool({ connectionString: 'postgres://localhost/yourdb' }); let embedder = null; async function generateEmbedding(text) { if (!embedder) { embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2'); } const output = await embedder(text, { pooling: 'mean', normalize: true }); return `[${Array.from(output.data).join(',')}]`; } fastify.post('/search', async (req, res) => { const { query } = req.body; if (!query) return res.status(400).send({ error: 'Query is required' }); const vector = await generateEmbedding(query); const { rows } = await pool.query( `SELECT title, content, embedding <#> $1::vector AS score FROM documents ORDER BY score ASC LIMIT 3`, [vector] ); res.send(rows); }); fastify.listen({ port: 3000 }, () => { console.log('API ready at http://localhost:3000'); }); Test it with: curl -X POST http://localhost:3000/search \ -H "Content-Type: application/json" \ -d '{"query": "Tell me about fruit"}' curl -X POST http://localhost:3000/search \ -H "Content-Type: application/json" \ -d '{"query": "Tell me about fruit"}' 5. Use Cases and Next Steps This stack is perfect for: RAG pipelines (feed results into an LLM)Internal knowledge searchChatbot memory lookupSmart filtering with natural language RAG pipelines (feed results into an LLM) Internal knowledge search Chatbot memory lookup Smart filtering with natural language What to add next: Metadata filtering in queriesChunking for longer docsHybrid search (text + vector)Integration with OpenAI or Mistral Metadata filtering in queries Chunking for longer docs Hybrid search (text + vector) Integration with OpenAI or Mistral Conclusion Vector search is no longer just for ML engineers. With pgvector, Tansformer.js, and Fastify, you can build your own semantic search engine in under an hour, without vendor lock-in. This is not production ready but can help as a baseline for your prod apps. Happy hacking!