Like many students, I do not enjoy scrolling through endless PDFs. I would if I could skip reading and just ask my textbook questions. So, naturally, I did what any lazy-but-resourceful person would do: I dumped the entire PDF into an LLM and started asking questions, praying to God that the answers were accurate. do not Spoiler alert: they weren’t. The answers were either vague, wrong, or just plain confusing. That’s when I realized — large language models aren’t magic (shocking, I know). They have context limits, and stuffing a whole book into one prompt is like trying to fit a watermelon into a ziplock bag. context limits So I started digging, and that’s when I found the real MVP: RAG (Retrieval-Augmented Generation). With RAG, instead of force-feeding the model everything, you teach it where to look — and suddenly, answers start making sense RAG (Retrieval-Augmented Generation) where Why Large Context Windows Don’t Really Help (Much) Why Large Context Windows Don’t Really Help (Much) You might think, “Wait… but newer models have massive context windows, right? Shouldn’t that fix the problem?” In theory? Yes. In practice? Meh. Even with context windows stretching up to 100k tokens (which sounds huge), you're still working with trade-offs: They’re expensive to use. They often truncate or compress information. And unless your prompt is perfectly structured (which is rarely the case), the model still ends up hallucinating or giving generic responses. They’re expensive to use. They’re expensive to use. They often truncate or compress information. They often truncate or compress information. And unless your prompt is perfectly structured (which is rarely the case), the model still ends up hallucinating or giving generic responses. And unless your prompt is perfectly structured (which is rarely the case), the model still ends up hallucinating or giving generic responses. It’s like asking your friend to remember every word of a 300-page book and hoping they don’t mess up the details. Not ideal. RAG to the Rescue RAG to the Rescue RAG — Retrieval-Augmented Generation — is like giving your LLM a cheat sheet… but a really smart, targeted one. RAG — Retrieval-Augmented Generation — is like giving your LLM a cheat sheet… but a really smart, targeted one. Here’s the flow: Here’s the flow: You split your book into smaller chunks You store these chunks in a vector DB. When a user asks a question, you don’t give the model the entire book — just the most relevant parts. Then the LLM crafts a solid, informed answer using onlythose parts. You split your book into smaller chunks You store these chunks in a vector DB. When a user asks a question, you don’t give the model the entire book — just the most relevant parts. entire most relevant parts Then the LLM crafts a solid, informed answer using onlythose parts. only Less noise. More signal. Way better answers. What Does the RAG Pipeline Look Like? What Does the RAG Pipeline Look Like? Imagine you’re the middleman between your textbook and your model. Your job is to: Your job is to: Split the content → Break the book into readable chunks Convert them into vectors → Using an embedding model (Cohere) Save those vectors → In a vector database (Pinecone) When a question is asked: Convert the question into a vector Search the database for chunks that are most similar (using cosine distance metric) Send the best matches + the question to a language model (I used Gemini) Boom — you get a clear, helpful answer Split the content → Break the book into readable chunks Split the content → Break the book into readable chunks Convert them into vectors → Using an embedding model (Cohere) Convert them into vectors → Using an embedding model (Cohere) Save those vectors → In a vector database (Pinecone) Save those vectors → In a vector database (Pinecone) When a question is asked: Convert the question into a vector Search the database for chunks that are most similar (using cosine distance metric) Send the best matches + the question to a language model (I used Gemini) Boom — you get a clear, helpful answer When a question is asked: Convert the question into a vector Search the database for chunks that are most similar (using cosine distance metric) Send the best matches + the question to a language model (I used Gemini) Boom — you get a clear, helpful answer Convert the question into a vector Search the database for chunks that are most similar (using cosine distance metric) Send the best matches + the question to a language model (I used Gemini) Boom — you get a clear, helpful answer And that’s the heart of it. You’re not replacing the model’s brain — just giving it better memory. My Stack: Simple, Powerful, Beginner-Friendly My Stack: Simple, Powerful, Beginner-Friendly Here’s what I used: 🧠 Cohere – To turn both book content and questions into vectors (aka embeddings) 📦 Pinecone – To store and search those vectors super efficiently 💬 Gemini – To generate the final, natural-language response 🧠 Cohere – To turn both book content and questions into vectors (aka embeddings) 🧠 Cohere – 📦 Pinecone – To store and search those vectors super efficiently 📦 Pinecone – 💬 Gemini – To generate the final, natural-language response 💬 Gemini – You don’t have to use these, but this combo is beginner-friendly, well-documented, and plays nicely together. have Step-by-Step: Build Your Own AskMyBook Bot Step-by-Step: Build Your Own AskMyBook Bot Okay, let’s actually build the thing now. I used Google Colab (because free GPU and easy sharing), but this should work in any Python environment. Okay, let’s actually build the thing now. I used Google Colab (because free GPU and easy sharing), but this should work in any Python environment. Step 1: Load and Chunk Your Book Step 1: Load and Chunk Your Book I used the PyMuPDF library to extract text. I used the PyMuPDF library to extract text. !pip install pymupdf !pip install pymupdf Now, let’s extract the text: Now, let’s extract the text: import fitz # PyMuPDF def extract_text_from_pdf(pdf_path): doc = fitz.open(pdf_path) text = "" for page in doc: text += page.get_text() return text book_path = 'enter the path here' book_text = extract_text_from_pdf(book_path) import fitz # PyMuPDF def extract_text_from_pdf(pdf_path): doc = fitz.open(pdf_path) text = "" for page in doc: text += page.get_text() return text book_path = 'enter the path here' book_text = extract_text_from_pdf(book_path) Now, we’ll split the book into chunks, making it more digestible. import re def chunk_text(text, chunk_size=300, overlap=50): words = re.findall(r'\S+', text) chunks = [] for i in range(0, len(words), chunk_size - overlap): chunk = ' '.join(words[i:i + chunk_size]) chunks.append(chunk) return chunks chunks = chunk_text(book_text) print(f"Total Chunks: {len(chunks)}") print("Sample chunk:\n", chunks[0]) import re def chunk_text(text, chunk_size=300, overlap=50): words = re.findall(r'\S+', text) chunks = [] for i in range(0, len(words), chunk_size - overlap): chunk = ' '.join(words[i:i + chunk_size]) chunks.append(chunk) return chunks chunks = chunk_text(book_text) print(f"Total Chunks: {len(chunks)}") print("Sample chunk:\n", chunks[0]) Here, each chunk has 300 words, with a 50-word overlap for context continuity. Think of it as giving the model a smooth flow between paragraphs. Step 2: Create Embeddings with Cohere Embeddings = turning text into numbers that reflect meaning. We’ll use Cohere's embed-english-v3.0 model for this. !pip install cohere import cohere co = cohere.Client("YOUR-API-KEY") # Replace with your actual key def get_embeddings(texts): response = co.embed( texts=texts, model="embed-english-v3.0", input_type="search_document" ) return response.embeddings !pip install cohere import cohere co = cohere.Client("YOUR-API-KEY") # Replace with your actual key def get_embeddings(texts): response = co.embed( texts=texts, model="embed-english-v3.0", input_type="search_document" ) return response.embeddings Step 3: Store Chunks in Pinecone Now we store the embeddings in Pinecone — a vector database that helps us search similar chunks later. !pip install pinecone import pinecone pinecone = pinecone.Pinecone(api_key="YOUR-API-KEY") index_name = "ask-my-book" if index_name not in pinecone.list_indexes().names(): pinecone.create_index( index_name, spec={ "pod": { "pod_type": "p1", "replicas": 1, "metric": "cosine", "environment": "aws-us-east1" } }, dimension=1024 ) index = pinecone.Index(index_name) !pip install pinecone import pinecone pinecone = pinecone.Pinecone(api_key="YOUR-API-KEY") index_name = "ask-my-book" if index_name not in pinecone.list_indexes().names(): pinecone.create_index( index_name, spec={ "pod": { "pod_type": "p1", "replicas": 1, "metric": "cosine", "environment": "aws-us-east1" } }, dimension=1024 ) index = pinecone.Index(index_name) Now, batch upload the chunks import uuid import time batch_size = 96 for i in range(0, len(chunks), batch_size): batch_chunks = chunks[i:i+batch_size] batch_embeds = get_embeddings(batch_chunks) ids = [str(uuid.uuid4()) for _ in batch_chunks] vectors = list(zip(ids, batch_embeds, [{"text": t} for t in batch_chunks])) index.upsert(vectors=vectors) time.sleep(60) # avoid hitting rate limits import uuid import time batch_size = 96 for i in range(0, len(chunks), batch_size): batch_chunks = chunks[i:i+batch_size] batch_embeds = get_embeddings(batch_chunks) ids = [str(uuid.uuid4()) for _ in batch_chunks] vectors = list(zip(ids, batch_embeds, [{"text": t} for t in batch_chunks])) index.upsert(vectors=vectors) time.sleep(60) # avoid hitting rate limits Boom! Your book is now smartly stored in vector format. Step 4: Ask Questions + Get Answers with Gemini We’ll search for relevant chunks using your query, and then pass those to Gemini for generating an answer. First, get the query embedding: def get_query_embedding(query): response = co.embed( texts=[query], model="embed-english-v3.0", input_type="search_query" ) return response.embeddings[0] def get_query_embedding(query): response = co.embed( texts=[query], model="embed-english-v3.0", input_type="search_query" ) return response.embeddings[0] Now, search Pinecone: def search_similar_chunks(query, top_k=5): query_embedding = get_query_embedding(query) result = index.query(vector=query_embedding, top_k=top_k, include_metadata=True) return [match['metadata']['text'] for match in result['matches']] def search_similar_chunks(query, top_k=5): query_embedding = get_query_embedding(query) result = index.query(vector=query_embedding, top_k=top_k, include_metadata=True) return [match['metadata']['text'] for match in result['matches']] Then plug the top chunks into Gemini: import google.generativeai as genai genai.configure(api_key="YOUR-GEMINI-API-KEY") model = genai.GenerativeModel("gemini-1.5-flash") def generate_answer(query): context_chunks = search_similar_chunks(query) context = "\n\n".join(context_chunks) prompt = f""" You are an assistant bot trained on the following book content. Use only the info provided to answer the user's question. Book Context: {context} Question: {query} If the question is not relevant to the context, respond with: 'I am a bot trained to answer questions based on the book content. This question is out of scope.' """ response = model.generate_content(prompt) return response.text import google.generativeai as genai genai.configure(api_key="YOUR-GEMINI-API-KEY") model = genai.GenerativeModel("gemini-1.5-flash") def generate_answer(query): context_chunks = search_similar_chunks(query) context = "\n\n".join(context_chunks) prompt = f""" You are an assistant bot trained on the following book content. Use only the info provided to answer the user's question. Book Context: {context} Question: {query} If the question is not relevant to the context, respond with: 'I am a bot trained to answer questions based on the book content. This question is out of scope.' """ response = model.generate_content(prompt) return response.text Try it out! question = "What’s does the author say in Module 1 of the book?" print(generate_answer(question)) question = "What’s does the author say in Module 1 of the book?" print(generate_answer(question)) That’s It — You Now Have an Ask-My-Book Bot! You built a bot that: Understands your textbook Finds the right part when asked Gives meaningful answers using that part only Understands your textbook Finds the right part when asked Gives meaningful answers using that part only No more endless skimming. Just type and ask. What Next? Level Up Your Book-Bot What we’ve built is a basic but powerful Question-and-Answer system. Think of it as the MVP (Minimum Viable Product) of your personal study assistant. But once you’re comfortable, there’s so much more you can add: Citations – Show which chunk or page the answer came from, so you can verify the source. Multi-turn Conversations – Let the bot remember previous questions and give more intelligent answers over time. Multi-step Reasoning – Chain thoughts together to answer complex questions. Custom Memory – Let your bot hold on to important facts you highlight for future queries. UI Upgrade – Hook this into a Streamlit or React frontend for a polished, user-friendly experience. Citations – Show which chunk or page the answer came from, so you can verify the source. Multi-turn Conversations – Let the bot remember previous questions and give more intelligent answers over time. Multi-step Reasoning – Chain thoughts together to answer complex questions. Custom Memory – Let your bot hold on to important facts you highlight for future queries. UI Upgrade – Hook this into a Streamlit or React frontend for a polished, user-friendly experience. With these, your bot goes from “smart textbook” to “AI study buddy.” If you’ve ever stared at a textbook, praying it would just talk back and tell you what matters — well, now it can. talk back This was my little experiment in turning boring PDFs into interactive conversations. Hope it inspires you to build your own and maybe even customize it for friends or classes. Got stuck somewhere? Want help with adding a UI or citations next? Drop a comment or ping me — always happy to chat.