Dive with me into the details of how you can use RAG to produce interesting results for questions related to a specific domain without needing to fine-tune your own model.
RAG or Retrieval Augmented Generation is a really complicated way of saying “Knowledge base + LLM.” It describes a system that adds extra data, in addition to what the user provided, before querying the LLM. Where did that additional data come from? It could be from a number of different sources like vector databases, search engines, other pre-trained LLMs, etc.
You can read my article for some more context about RAG.
Before we get to RAG, we need to discuss “Chains” — the raison d’être for Langchain’s existence. From Langchain documentation, Chains refer to sequences of calls — whether to an LLM, a tool, or a data preprocessing step. You can see more details in the experiments section.
A simple chain you can set up would look something like —
chain = prompt | llm | output
As you can see, this is very straightforward. You are passing a prompt to an LLM of choice and then using a parser to produce the output. You are using langchain’s concept of “chains” to help sequence these elements, much like you would use pipes in Unix to chain together several system commands like ls | grep file.txt.
The majority of use cases that use RAG today do so by using some form of vector database to store the information that the LLM uses to enhance the answer to the prompt. You can read how vector stores (or vector databases) store information and why it’s a better alternative than something like a traditional SQL or NoSQL store here.
There are several flavors of vector databases ranging from commercial paid products like Pinecone to open-source alternatives like ChromaDB and FAISS. I typically use ChromaDB because it’s easy to use and has a lot of support in the community.
At a high level, this is how RAG works with vector stores —
In practice, a RAG chain would look something like this —
docs_chain = create_stuff_documents_chain(llm, prompt)
retriever = vector_store.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, docs_chain)
If you were to simplify the details of all these langchain functions like create_stuff_documents_chain
and create_retrieval_chain
(you can read up on these in Langchain’s official documentation) really, what it boils down to is something like —
context_data = vector_store
chain = context_data | llm | prompt
which at a high level isn’t all that different from the base case shown above. The main difference is the inclusion of the contextual data that is provided with your prompt. And that, my friends, is RAG.
However, like all modern software, things can’t be THAT simple. So there is a lot of syntactic sugar to make all of these things work, which looks like —
docs_chain = create_stuff_documents_chain(llm, prompt)
retriever = vector_store.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, docs_chain)
As you can see in the diagram above there are many things happening to build an actual RAG-based system. However, if you focus on the “Retrieval chain,” you will see that it is composed of 2 main components —
Bob’s your uncle.
To demonstrate the effectiveness of RAG, I would like to know the answer to the question —
How can langsmith help with testing?
For those who are unaware, Langsmith is Langchain’s product offering, which provides tooling to help develop, test, deploy, and monitor LLM applications.
Since unpaid versions of LLMs (as of 4/24) still have the limitation of not being connected to the internet and are trained on data from before 2021, Langsmith is not a concept known to LLMs. The usage of RAG here would be to see if providing context about Langsmith helps the LLM provide a better version of the answer to the question above.
from langchain_community.llms import Ollama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# Simple chain invocation
## LLM + Prompt
llm = Ollama(model="mistral")
output = StrOutputParser()
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a skilled technical writer.",
),
("human", "{user_input}"),
]
)
chain = prompt | llm | output
## Winner winner chicken dinner
response = chain.invoke({"user_input": "how can langsmith help with testing?"})
print(":::ROUND 1:::")
print(response)
# :::ROUND 1::: SIMPLE RETRIEVAL
Langsmith, being a text-based AI model, doesn't directly interact with software or perform tests in the traditional sense. However, it can assist with various aspects of testing through its strong language processing abilities. Here are some ways Langsmith can contribute to testing:
1. Writing test cases and test plans: Langsmith can help write clear, concise, and comprehensive test cases and test plans based on user stories or functional specifications. It can also suggest possible edge cases and boundary conditions for testing.
2. Generating test data: Langsmith can create realistic test data for different types of applications. This can be especially useful for large datasets or complex scenarios where generating data manually would be time-consuming and error-prone.
3. Creating test scripts: Langsmith can write test scripts in popular automation frameworks such as Selenium, TestNG, JMeter, etc., based on the test cases and expected outcomes.
4. Providing test reports: Langsmith can help draft clear and concise test reports that summarize the results of different testing activities. It can also generate statistics and metrics from test data to help identify trends and patterns in software performance.
5. Supporting bug tracking systems: Langsmith can write instructions for how to reproduce bugs and suggest potential fixes based on symptom analysis and past issue resolutions.
6. Automating regression tests: While it doesn't directly execute automated tests, Langsmith can write test scripts or provide instructions on how to automate existing manual tests using tools like Selenium, TestComplete, etc.
7. Improving testing documentation: Langsmith can help maintain and update testing documentation, ensuring that all relevant information is kept up-to-date and easily accessible to team members.
The extra context is coming from a webpage that we will be loading into our vector store.
from langchain_community.llms import Ollama
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
# Invoke chain with RAG context
llm = Ollama(model="mistral")
## Load page content
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")
docs = loader.load()
## Vector store things
embeddings = OllamaEmbeddings(model="nomic-embed-text")
text_splitter = RecursiveCharacterTextSplitter()
split_documents = text_splitter.split_documents(docs)
vector_store = FAISS.from_documents(split_documents, embeddings)
## Prompt construction
prompt = ChatPromptTemplate.from_template(
"""
Answer the following question only based on the given context
<context>
{context}
</context>
Question: {input}
"""
)
## Retrieve context from vector store
docs_chain = create_stuff_documents_chain(llm, prompt)
retriever = vector_store.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, docs_chain)
## Winner winner chicken dinner
response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})
print(":::ROUND 2:::")
print(response["answer"])
# :::ROUND 2::: RAG RETRIEVAL
Langsmith is a platform that helps developers test and monitor their Large Language Model (LLM) applications in various stages of development, including prototyping, beta testing, and production. It provides several workflows to support effective testing:
1. Tracing: Langsmith logs application traces, allowing users to debug issues by examining the data step-by-step. This can help identify unexpected end results, infinite agent loops, slower than expected execution, or higher token usage. Traces in Langsmith are rendered with clear visibility and debugging information at each step of an LLM sequence, making it easier to diagnose and root-cause issues.
2. Initial Test Set: Langsmith supports creating datasets (collections of inputs and reference outputs) and running tests on LLM applications using these test cases. Users can easily upload, create on the fly, or export test cases from application traces. This allows developers to adopt a more test-driven approach and compare test results across different model configurations.
3. Comparison View: Langsmith's comparison view enables users to track and diagnose regressions in test scores across multiple revisions of their applications. Changes in the prompt, retrieval strategy, or model choice can have significant implications on the responses produced by the application, so being able to compare results for different configurations side-by-side is essential.
4. Monitoring and A/B Testing: Langsmith provides monitoring charts to track key metrics over time and drill down into specific data points to get trace tables for that time period. This is helpful for debugging production issues and A/B testing changes in prompt, model, or retrieval strategy.
5. Production: Once the application hits production, Langsmith's high-level overview of application performance with respect to latency, cost, and feedback scores ensures it continues delivering desirable results at scale.
✅ As you can see, there are very specific references to the testing capabilities of Langsmith, which we were able to extract due to the supplemental knowledge provided using the PDF.
We were able to augment the capabilities of the standard LLM with the specific domain knowledge required to answer this question.
This is just the tip of the iceberg with RAG. There is a world of optimizations and enhancements you can make to see the full power of RAG applied in practice. Hopefully, this is a good launchpad for you to try out the bigger and better things yourself!
To get access to the complete code, you can go here. Thanks for reading!
⭐ If you like this type of content, be sure to follow me or subscribe to https://a1engineering.substack.com/subscribe! ⭐