In this tutorial, we will create a chatbot system that can be trained with custom data from PDF files. The chatbot will utilize Next.js for the frontend, MaterialUI for the UI components, Langchain and OpenAI for working with language models, and Supabase to store the data and embeddings. By the end, you will have a fully functional chatbot that can answer questions based on the contents of uploaded PDF files.
Next.js
Next.js is a powerful and flexible React framework developed by Vercel that enables developers to build server-side rendering (SSR) and static web applications with ease. It combines the best features of React with additional capabilities to create optimized and scalable web applications.
OpenAI
The OpenAI module in Node.js provides a way to interact with OpenAI’s API, allowing developers to leverage powerful language models like GPT-3 and GPT-4. This module enables you to integrate advanced AI functionalities into your Node.js applications.
LangChain.js
LangChain is a powerful framework designed for developing applications with language models. Originally developed for Python, it has since been adapted for other languages, including Node.js. Here’s an overview of LangChain in the context of Node.js:
LangChain is a library that simplifies the creation of applications using large language models (LLMs). It provides tools to manage and integrate LLMs into your applications, handle the chaining of calls to these models, and enable complex workflows with ease.
Large Language Models (LLMs) like OpenAI’s GPT-3.5 are trained on vast amounts of text data to understand and generate human-like text. They can generate responses, translate languages, and perform many other natural language processing tasks.
Supabase is an open-source backend-as-a-service (BaaS) platform designed to help developers quickly build and deploy scalable applications. It offers a suite of tools and services that simplify database management, authentication, storage, and real-time capabilities, all built on top of PostgreSQL
User Input: The user provides an input query.
Prompt Conversion: The input is converted into a standalone question.
Vector Conversion: The question is converted into a vector.
Nearest Match Search: The system searches for the nearest match in the vector store.
Response Generation: The system generates an answer based on the closest match.
Before we start, ensure you have the following:
First, create an extension if it doesn’t already exist for our vector store:
create extension if not exists vector;
Next, create a table named “documents”. This table will be used to store and embed the content of our uploaded PDF files in vector format:
create table if not exists documents (
id bigint primary key generated always as identity,
content text,
metadata jsonb,
embedding vector(1536)
);
Now, we need a function to query our embedded data:
create or replace function match_documents (
query_embedding vector(1536),
match_count int default null,
filter jsonb default '{}'
) returns table (
id bigint,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding
limit match_count;
end;
$$;
The “match_documents” function performs the task of querying the embedded data. We will call this function in our Next.js app via Supabase Vector Store.
Next, we need to set up our tables for the chatbot system:
create table if not exists files (
id bigint primary key generated always as identity,
name text not null,
created_at timestamp with time zone default timezone('utc'::text, now()) not null
);
create table if not exists rooms (
id bigint primary key generated always as identity,
created_at timestamp with time zone default timezone('utc'::text, now()) not null
);
create table if not exists chats (
id bigint primary key generated always as identity,
room bigint references rooms(id) on delete cascade,
role text not null,
message text not null,
created_at timestamp with time zone default timezone('utc'::text, now()) not null
);
The “files” table will store details of the uploaded PDF files. This allows us to reference and filter the files in the “documents” table. Our chatbot system will query embedding data with the given “file id” selected in our app. This way, our chatbot system can manage multiple PDF files and focus on the context of a specific file.
The “rooms” table will store all the chat sessions, allowing users to have multiple chat sessions within our app.
Finally, the “chats” table will store all the chats from a particular chat session (room). The role will differentiate whether it’s a user or a bot. If it’s a user, the role will be “user”.
$ npx create-next-app chatbot
$ cd ./chatbot
Install the required dependencies:
npm install @langchain/community @langchain/core @langchain/openai @supabase/supabase-js langchain openai pdf-parse pdfjs-dist
Then we will install Material UI for building our interface, feel free to use other library:
npm install @mui/material @emotion/react @emotion/styled
Create a file to connect your Next.js app to Supabase:
// src/libs/supabaseClient.ts
import { createClient, SupabaseClient } from "@supabase/supabase-js";
const supabaseUrl: string = process.env.NEXT_PUBLIC_SUPABASE_URL || "";
const supabaseAnonKey: string = process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY || "";
if (!supabaseUrl) throw new Error("Supabase URL not found.");
if (!supabaseAnonKey) throw new Error("Supabase Anon key not found.");
export const supabaseClient: SupabaseClient = createClient(supabaseUrl, supabaseAnonKey);
Create a file to set up LangChain and Embeddings:
// src/libs/openAI.ts
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
const openAIApiKey: string = process.env.NEXT_PUBLIC_OPENAI_API_KEY || "";
if (!openAIApiKey) throw new Error("OpenAI API key not found.");
export const llm = new ChatOpenAI({
openAIApiKey,
modelName: "gpt-3.5-turbo",
temperature: 0.9,
});
export const embeddings = new OpenAIEmbeddings(
{
openAIApiKey,
},
{ maxRetries: 0 }
);
Lastly, we need to update our Next.js config file, since we will be using Web PDF Loader from Langchain, and it depends on fs module that will throw error if used in the browser. So update your config file following this snippet:
/** @type {import('next').NextConfig} */
const nextConfig = {
reactStrictMode: true,
output: "export",
webpack: (config, { isServer }) => {
// See https://webpack.js.org/configuration/resolve/#resolvealias
config.resolve.alias = {
...config.resolve.alias,
sharp$: false,
"onnxruntime-node$": false,
};
config.experiments = {
...config.experiments,
topLevelAwait: true,
asyncWebAssembly: true,
};
config.module.rules.push({
test: /\.md$/i,
use: "raw-loader",
});
// Fixes npm packages that depend on `fs` module
if (!isServer) {
config.resolve.fallback = {
...config.resolve.fallback, // if you miss it, all the other options in fallback, specified
// by next.js will be dropped. Doesn't make much sense, but how it is
fs: false, // the solution
"node:fs/promises": false,
module: false,
perf_hooks: false,
};
}
return config;
},
};
export default nextConfig;
Now our Next.js app is ready! let’s continue on building the chatbot system.
We will use these services/methods to communicate with the Supabase from our React component.
The file service handles file-related operations, such as fetching the list of files and saving a new file to the database.
// src/services/file.ts
import { embeddings } from "@/libs/openAI";
import { supabaseClient } from "@/libs/supabaseClient";
import { WebPDFLoader } from "@langchain/community/document_loaders/web/pdf";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
export interface IFile {
id?: number | undefined;
name: string;
created_at?: Date | undefined;
}
// Fetch the list of uploaded files from the Supabase database.
export async function fetchFiles(): Promise<IFile[]> {
const { data, error } = await supabaseClient
.from("files")
.select()
.order("created_at", { ascending: false })
.returns<IFile[]>();
if (error) throw error;
return data;
}
// Save a new file to the database, convert it to vectors, and store the vectors.
export async function saveFile(file: File): Promise<IFile> {
const { data, error } = await supabaseClient
.from("files")
.insert({ name: file.name })
.select()
.single<IFile>();
if (error) throw error;
const loader = new WebPDFLoader(file);
const output = await loader.load();
const docs = output.map((d) => ({
...d,
metadata: { ...d.metadata, file_id: data.id },
}));
await SupabaseVectorStore.fromDocuments(docs, embeddings, {
client: supabaseClient,
tableName: "documents",
queryName: "match_documents",
});
return data;
}
The room service handles operations related to chat rooms, such as fetching the list of rooms and creating a new room.
// src/services/room.ts
import { supabaseClient } from "@/libs/supabaseClient";
export interface IRoom {
id?: number | undefined;
created_at?: Date | undefined;
}
// Fetch the list of chat rooms from the Supabase database.
export async function fetchRooms(): Promise<IRoom[]> {
const { data, error } = await supabaseClient
.from("rooms")
.select()
.order("created_at", { ascending: false })
.returns<IRoom[]>();
if (error) throw error;
return data;
}
// Create a new chat room in the database.
export async function createRoom(): Promise<IRoom> {
const { data, error } = await supabaseClient
.from("rooms")
.insert({})
.select()
.single<IRoom>();
if (error) throw error;
return data;
}
The chat service handles operations related to chats, such as fetching the list of chats, posting a new chat, and getting an answer from the chatbot.
// src/services/chat.ts
import { embeddings, llm } from "@/libs/openAI";
import { supabaseClient } from "@/libs/supabaseClient";
import { SupabaseVectorStore } from "@langchain/community/vectorstores/supabase";
import { StringOutputParser } from "@langchain/core/output_parsers";
import {
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
} from "@langchain/core/prompts";
import {
RunnablePassthrough,
RunnableSequence,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";
export interface IChat {
id?: number | undefined;
room: number;
role: string;
message: string;
created_at?: Date | undefined;
}
// Fetch the list of chats for a given room from the Supabase database.
export async function fetchChats(roomId: number): Promise<IChat[]> {
const { data, error } = await supabaseClient
.from("chats")
.select()
.eq("room", roomId)
.order("created_at", { ascending: true })
.returns<IChat[]>();
if (error) throw error;
return data;
}
// Post a new chat message to the database.
export async function postChat(chat: IChat): Promise<IChat> {
const { data, error } = await supabaseClient
.from("chats")
.insert(chat)
.select()
.single<IChat>();
if (error) throw error;
return data;
}
// Get an answer from the chatbot based on the user's chat message.
export async function getAnswer(chat: IChat, fileId: number): Promise<IChat> {
const vectorStore = await SupabaseVectorStore.fromExistingIndex(embeddings, {
client: supabaseClient,
tableName: "documents",
queryName: "match_documents",
});
const retriever = vectorStore.asRetriever({
filter: (rpc) => rpc.filter("metadata->>file_id", "eq", fileId),
k: 2,
});
const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}`;
const messages = [
SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
HumanMessagePromptTemplate.fromTemplate("{question}"),
];
const prompt = ChatPromptTemplate.fromMessages(messages);
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
llm,
new StringOutputParser(),
]);
const answer = await chain.invoke(chat.message);
const { data, error } = await supabaseClient
.from("chats")
.insert({
role: "bot",
room: chat.room,
message: answer,
})
.select()
.single<IChat>();
if (error) throw error;
return data;
}
The ChatRoom component handles the display and interaction of the chat interface.
// src/components/ChatRoom.tsx
import {
Box,
Button,
LinearProgress,
Stack,
TextField,
Typography,
} from "@mui/material";
import { ChangeEvent, MouseEvent, useEffect, useState } from "react";
import { IChat, fetchChats, getAnswer, postChat } from "@/services/chat";
export default function ChatRoom({
roomId,
fileId,
}: {
roomId: number;
fileId: number;
}) {
const [message, setMessage] = useState<string>("");
const [chats, setChats] = useState<IChat[]>([]);
const [submitting, setSubmitting] = useState(false);
const onChangeInput = (e: ChangeEvent<HTMLInputElement>) =>
setMessage(e.target.value);
const onSubmitInput = async (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
if (!message) return;
let currChats = [...chats];
try {
setSubmitting(true);
const chat = await postChat({
role: "user",
room: roomId,
message,
});
setMessage("");
currChats.push(chat);
const answer = await getAnswer(chat, fileId);
currChats.push(answer);
setChats(currChats);
} catch (err) {
console.error(err);
} finally {
setSubmitting(false);
}
};
useEffect(() => {
(async () => {
try {
if (typeof roomId !== "undefined") {
const chats = await fetchChats(roomId);
setChats(chats);
}
} catch (err) {
console.error(err);
}
})();
}, [roomId]);
return (
<>
<Stack sx={{ gap: 2, mb: 2 }}>
{chats.map((chat, i) => (
<Box
key={i}
sx={{
display: "flex",
justifyContent: chat.role === "user" ? "flex-end" : "flex-start",
}}
>
<Box
sx={{
minWidth: "250px",
maxWidth: "1000px",
p: 2,
border: "1px solid #555",
borderRadius: (theme) => theme.spacing(2),
}}
>
<Typography
sx={{
whiteSpace: "pre-line",
wordBreak: "break-word",
mb: 2,
display: "block",
}}
>
{chat.message}
</Typography>
</Box>
</Box>
))}
</Stack>
{submitting && <LinearProgress />}
<TextField
fullWidth
multiline
minRows={2}
maxRows={10}
value={message}
label="Write Something ..."
onChange={onChangeInput}
sx={{ mb: 2 }}
/>
<Button
fullWidth
type="submit"
variant="contained"
onClick={onSubmitInput}
disabled={submitting}
>
<Typography>Send</Typography>
</Button>
</>
);
}
The FileUploader component handles uploading files.
// src/components/Fi
import { ChangeEvent, MouseEvent, useState } from "react";
import { Box, Button, Typography } from "@mui/material";
import { IFile, saveFile } from "@/services/file";
export default function FileUploader({
onSave,
}: {
onSave: (file: IFile) => void;
}) {
const [inputFile, setInputFile] = useState<File | undefined>(undefined);
const [uploading, setUploading] = useState<boolean>(false);
const onChangeFile = (e: ChangeEvent<HTMLInputElement>) => {
const file = e?.target?.files?.[0];
setInputFile(file);
};
const handleSaveFile = async (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
if (!inputFile) return;
try {
setUploading(true);
const file = await saveFile(inputFile);
onSave(file);
} catch (err) {
console.error(err);
} finally {
setUploading(false);
}
};
return (
<>
<Box
component="label"
htmlFor="file-uploader"
sx={{ mb: 2, display: "block" }}
>
<input
accept="application/pdf"
id="file-uploader"
type="file"
style={{ display: "none" }}
onChange={onChangeFile}
/>
<Button variant="outlined" fullWidth component="span">
<Typography>{inputFile ? inputFile.name : "Select File"}</Typography>
</Button>
</Box>
<Button
fullWidth
variant="contained"
color="primary"
disabled={!inputFile || uploading}
onClick={handleSaveFile}
>
<Typography>Upload</Typography>
</Button>
</>
);
}
The Home component is the main page that allows the user to create or select chat room, upload a file, and selecting a file to chat about.
// src/pages/index.tsx
import ChatRoom from "@/components/ChatRoom";
import FileUploader from "@/components/FileUploader";
import { IFile, fetchFiles } from "@/services/file";
import { IRoom, createRoom, fetchRooms } from "@/services/room";
import {
Button,
Divider,
Grid,
List,
ListItemButton,
Typography,
} from "@mui/material";
import { MouseEvent, useEffect, useMemo, useState } from "react";
export default function Home() {
const [rooms, setRooms] = useState<IRoom[]>([]);
const [files, setFiles] = useState<IFile[]>([]);
const [roomId, setRoomId] = useState<number | undefined>(undefined);
const [fileId, setFileId] = useState<number | undefined>(undefined);
const onSaveFile = (file: IFile) => setFiles((v) => [file, ...v]);
const handleCreateRoom = async (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
try {
const newRoom = await createRoom();
setRooms((v) => [newRoom, ...v]);
setRoomId(newRoom.id);
} catch (err) {
console.error(err);
}
};
const handleSelectRoom =
(id: number | undefined) => (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
setRoomId(id);
};
const handleSelectFile =
(id: number | undefined) => (e: MouseEvent<HTMLElement>) => {
e.preventDefault();
setFileId(id);
};
useEffect(() => {
(async () => {
try {
const rooms = await fetchRooms();
setRooms(rooms);
const files = await fetchFiles();
setFiles(files);
} catch (err) {
console.error(err);
}
})();
}, []);
return (
<Grid container>
<Grid item xs={2} sx={{ p: 2 }}>
<Button fullWidth variant="contained" onClick={handleCreateRoom}>
New Chat
</Button>
<Divider sx={{ my: 2 }} />
<List>
{rooms.map((room, i) => (
<ListItemButton
selected={roomId === room.id}
key={i}
onClick={handleSelectRoom(room.id)}
>
{room.created_at?.toString()}
</ListItemButton>
))}
</List>
</Grid>
<Grid item xs={2} sx={{ p: 2 }}>
<FileUploader onSave={onSaveFile} />
<Divider sx={{ my: 2 }} />
<List>
{files.map((file, i) => (
<ListItemButton
selected={fileId === file.id}
key={i}
onClick={handleSelectFile(file.id)}
>
{file.name}
</ListItemButton>
))}
</List>
</Grid>
<Grid item xs sx={{ p: 2 }}>
{roomId && fileId ? (
<ChatRoom roomId={roomId as number} fileId={fileId as number} />
) : (
<Typography>Select one room and one file</Typography>
)}
</Grid>
</Grid>
);
}
Create a .env file in the root of your project to store your environment variables:
NEXT_PUBLIC_SUPABASE_URL=your-supabase-url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-supabase-anon-key
NEXT_PUBLIC_OPENAI_API_KEY=your-openai-api-key
Finally, start your Next.js application:
npm run dev
Now, you should have a running application where you can upload PDF files, chat with a bot trained on your data, and receive relevant responses based on the uploaded content.
This guide provided a comprehensive overview of building a custom chatbot that can answer questions based on uploaded PDF files. You learned how to set up your project, configure Supabase and OpenAI, create the necessary services, and build the frontend components with React and MaterialUI. With this foundation, you can extend and customize the chatbot to fit your specific needs.
Check out the source code in this repo: https://github.com/firstpersoncode/chatbot
Happy coding!