Hello people! Ever since ChatGPT was released in 2022, I’m sure not a lot of developers have tried to build their own, small chatbots without using APIs or integrating LLMs. That got me thinking— do they even remember how we used to build the world’s simplest chatbots for fun, back then? Maybe. Maybe not. remember But just in case you weren’t actively interested in programming before the rise of the LLMs, here’s how we did it, in a tutorial. And a fun fact: this code is from a tutorial I wrote in 2021. The process here is to simply use the `chatterbot` library to train a simple chatbot based on a dataset with conversation data, using Python. Setting up the environment Before writing code, you have to install the dependencies: pip install chatterbot chatterbot_corpus pyjokes pandas # the command depends on your OS pip install chatterbot chatterbot_corpus pyjokes pandas # the command depends on your OS And then import them: import spacy from spacy.cli import download import pandas as pd from chatterbot import ChatBot from chatterbot.trainers import ListTrainer import pyjokes import spacy from spacy.cli import download import pandas as pd from chatterbot import ChatBot from chatterbot.trainers import ListTrainer import pyjokes The chatterbot library we have used is a Python module that lets you train chatbots using your own datasets. Talking about these pre-defined datasets, it’s one of the best things about building a chatbot this way. chatterbot This means that you have full access to the data used by your chatbot (unlike using LLMs or APIs), and literally, you can create the dataset from scratch. It’s an exhausting task, but it’s like building your own large small language model. So if you want this chatbot to have an “attitude“ or speak in legal terms, you just have to create the dataset that way. For this tutorial, however, I’ll be using this “3K Conversations Dataset for ChatBot“ dataset from Kaggle by Kreesh Rajani. Kudos to them! 3K Conversations Dataset for ChatBot Download it here: https://www.kaggle.com/datasets/kreeshrajani/3k-conversations-dataset-for-chatbot https://www.kaggle.com/datasets/kreeshrajani/3k-conversations-dataset-for-chatbot After downloading, extract the file in your project directory (ie, same folder as your Python script). A note: The “standard“ way to create a dataset for chatbots is to create it as a CSV file, but using a text file with phrases separated by commas is still a legit way to do it. Writing the code This is the biggest deal here, after creating (or stealing) a dataset. What we do here is similar to building an ML model, just a lot easier, simpler, and quicker, thanks to both the chatterbot library and the smaller dataset. First, we’ll create an instance for the chatbot and prepare the CSV file as a dataset: # Create ChatBot instance chatbot = ChatBot('MyLittleBot', read_only=True) # Load CSV and prepare training data df = pd.read_csv('Conversation.csv') # Make sure the filename matches conversations = df[['question', 'answer']].values.tolist() # Create ChatBot instance chatbot = ChatBot('MyLittleBot', read_only=True) # Load CSV and prepare training data df = pd.read_csv('Conversation.csv') # Make sure the filename matches conversations = df[['question', 'answer']].values.tolist() Then, we simply train the chatbot using the built-in training functions: # Flatten the conversations for training training_data = [] for prompt, response in conversations: training_data.append(prompt) training_data.append(response) # Train the chatbot trainer = ListTrainer(chatbot) trainer.train(training_data) # Flatten the conversations for training training_data = [] for prompt, response in conversations: training_data.append(prompt) training_data.append(response) # Train the chatbot trainer = ListTrainer(chatbot) trainer.train(training_data) And that’s pretty much it. The last step is the classic input-output conditions. This part is highly customizable. For example, if you want the chatbot to respond in a certain way to a certain input, all you have to do is write an if condition. Say, you want the chatbot to generate a joke when you ask it to “tell a joke“; this is how you can write the code for that: certain way to a certain input elif user_input.strip().lower() == 'be funny': joke = pyjokes.get_joke(language='en', category='all') print(f"ChatBot (joke): {joke}") elif user_input.strip().lower() == 'be funny': joke = pyjokes.get_joke(language='en', category='all') print(f"ChatBot (joke): {joke}") After adding all the if conditions, finally add the `else` condition to let the chatbot answer all other queries. In our case, we’ll just limit it to these 2 conditions: # Chat loop while True: message = input("You: ") if message.lower() == 'bye': print("ChatBot: Bye!") break elif message.lower() == 'be funny': joke = pyjokes.get_joke() print("ChatBot (joke):", joke) else: response = chatbot.get_response(message) print("ChatBot:", response) # Chat loop while True: message = input("You: ") if message.lower() == 'bye': print("ChatBot: Bye!") break elif message.lower() == 'be funny': joke = pyjokes.get_joke() print("ChatBot (joke):", joke) else: response = chatbot.get_response(message) print("ChatBot:", response) And voila! It’s ready to run. Here’s the full code: import spacy from spacy.cli import download import pandas as pd from chatterbot import ChatBot from chatterbot.trainers import ListTrainer import pyjokes # Create ChatBot instance chatbot = ChatBot('MyLittleBot', read_only=True) # Load CSV and prepare training data df = pd.read_csv('Conversation.csv') # Make sure the filename matches conversations = df[['question', 'answer']].values.tolist() # Flatten Q&A into one long list training_data = [] for question, answer in conversations: training_data.append(str(question)) training_data.append(str(answer)) # Train the chatbot trainer = ListTrainer(chatbot) trainer.train(training_data) # Chat loop while True: message = input("You: ") if message.lower() == 'bye': print("ChatBot: Bye!") break elif message.lower() == 'be funny': joke = pyjokes.get_joke() print("ChatBot (joke):", joke) else: response = chatbot.get_response(message) print("ChatBot:", response) import spacy from spacy.cli import download import pandas as pd from chatterbot import ChatBot from chatterbot.trainers import ListTrainer import pyjokes # Create ChatBot instance chatbot = ChatBot('MyLittleBot', read_only=True) # Load CSV and prepare training data df = pd.read_csv('Conversation.csv') # Make sure the filename matches conversations = df[['question', 'answer']].values.tolist() # Flatten Q&A into one long list training_data = [] for question, answer in conversations: training_data.append(str(question)) training_data.append(str(answer)) # Train the chatbot trainer = ListTrainer(chatbot) trainer.train(training_data) # Chat loop while True: message = input("You: ") if message.lower() == 'bye': print("ChatBot: Bye!") break elif message.lower() == 'be funny': joke = pyjokes.get_joke() print("ChatBot (joke):", joke) else: response = chatbot.get_response(message) print("ChatBot:", response) If you chat with it a little, you can see how prehistoric it feels to chat with a chatbot. In fact, it can’t even answer a simple “how are you“ in most cases. This is simply because it was trained on a small set of data; it’s the most basic type of chatbot you can ever build. It doesn’t integrate context windows, memory retention, or real language understanding; it’s just pattern matching from the dataset’s two columns: query and answer. At the end of the day, the purpose of this tutorial was absolutely not to build an ML model or an AI model. It’s just an overview of how chatbots were built from the absolute starting point. Thanks for reading! And a side note: I won’t be writing anything for the next few weeks (I’m working on something serious), so yep, see you in a month! Another side note: I wrote & published this article in one sitting, so it’s not proofread. Please leave a comment if you see something weird.