This is the first part of a deeper dive into Ollama and things that I have learned about local LLMs and how you can use them for inference-based applications. In this post, you will learn about — How to use Ollama How to create your own model in Ollama Using Ollama to build a chatbot To understand the basics of LLMs (including Local LLMs), you can refer to my previous post on this topic . here First, Some Background In the space of local LLMs, I first ran into LMStudio. While the app itself is easy to use, I liked the simplicity and maneuverability that Ollama provides. To learn more about Ollama you can go . here tl;dr: Ollama hosts its own curated list of models that you have access to. You can download these models to your local machine, and then interact with those models through a command line prompt. Alternatively, when you run the model, Ollama also runs an inference server hosted at port 11434 (by default) that you can interact with by way of APIs and other libraries like Langchain. As of this post, Ollama has 74 models, which also include categories like embedding models. How to Use Ollama Ollama for the OS of your choice. Once you do that, you run the command to confirm it’s working. It should show you the help menu — Download ollama Usage: ollama [flags] ollama [command] Available Commands: serve Start ollama create Create a model from a Modelfile show Show information for a model run Run a model pull Pull a model from a registry push Push a model to a registry list List models cp Copy a model rm Remove a model help Help about any command Flags: -h, --help help for ollama -v, --version Show version information Use "ollama [command] --help" for more information about a command. To use any model, you first need to “pull” them from Ollama, much like you would pull down an image from Dockerhub (if you have used that in the past) or something like Elastic Container Registry (ECR). Ollama ships with some default models (like which is Facebook’s open-source LLM) which you can see by running. llama2 ollama list Select the model (let’s say ) that you would like to interact with from the . You can now pull down this model by running the command phi Ollama library page ollama pull phi Once the download is complete, you can check to see whether the model is available locally by running — ollama list Now that the model is available, it is ready to be run with. You can run a model using the command — ollama run phi The accuracy of the answers isn’t always top-notch, but you can address that by selecting different models or perhaps doing some fine-tuning or implementing a RAG-like solution on your own to improve accuracy. What I have demonstrated above is how you can use Ollama models using the command line prompt. However, if you check the inference server that Llama has running you can see that there are programmatic ways of accessing this by hitting port 11434. If you wanted to use Langchain to access your Ollama model, you can use something like — from langchain_community.llms import Ollama from langchain.chains import RetrievalQA prompt = "What is the difference between an adverb and an adjective?" llm = Ollama(model="mistral") qa = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True, ) response = qa(prompt) How to Create Your Own Model in Ollama You can also create your own model variant using the concept of in Ollama. For more parameters to configure in your Modelfile, you can look at these docs. Modelfile Example Modelfile — # Downloaded from Hugging Face https://huggingface.co/TheBloke/finance-LLM-GGUF/tree/main FROM "./finance-llm-13b.Q4_K_M.gguf" PARAMETER temperature 0.001 PARAMETER top_k 20 TEMPLATE """ {{.Prompt}} """ # set the system message SYSTEM """ You are Warren Buffet. Answer as Buffet only, and do so in short sentences. """ Once you have the Modelfile, you can create your model using ollama create arjunrao87/financellm -f Modelfile where is the name of your LLM model and would be replaced by your ollama.com username (which also acts as the namespace of your online ollama registry). At this point, you can use your created model like any other model on Ollama. financellm arjunrao87 You can also choose to push your model to the remote ollama registry. To make this happen, you need to Create your account on ollama.com Add a new model Have the public keys set up to allow you to push models from your remote machine. Once you have created your local llm, you can push it to the ollama registry using — ollama push arjunrao87/financellm 🦄 Now, let’s get to the good part. Using Ollama to Build a Chatbot During my quest to use Ollama, one of the more pleasant discoveries was this ecosystem of Python-based web application builders that I came across. can be used to build a full-fledged chatbot like ChatGPT. As their page says, Chainlit Chainlit is an open-source Python package to build production ready Conversational AI I walked through a few of the Chainlit tutorials to get a handle on what you can do with chainlit, which includes things like creating sequences of tasks (called “steps”), enabling buttons and actions, sending images, and all kinds of things. You can follow this part of my journey . here Once I got the hang of Chainlit, I wanted to put together a straightforward chatbot that basically used Ollama so that I could use a local LLM to chat with (instead of say ChatGPT or Claude). With less than 50 lines of code, you can do that using Chainlit + Ollama. Isn’t that crazy? Chainlit as a library is super straightforward to use. I also used Langchain for using and interacting with Ollama. from langchain_community.llms import Ollama from langchain.prompts import ChatPromptTemplate import chainlit as cl The next step is to define how you want the loading screen of the chatbot to look, by using the decorator of chainlit — @cl.on_chat_start @cl.on_chat_start async def on_chat_start(): elements = [cl.Image(name="image1", display="inline", path="assets/gemma.jpeg")] await cl.Message( content="Hello there, I am Gemma. How can I help you?", elements=elements ).send() .... .... The interface is what Chainlit uses to send responses back to the UI. You can construct messages with the simple key, and then you can embellish it with things like which, in my case, I have added an to show an image when the user first logs in. Message content elements Image The next step is to invoke Langchain to instantiate Ollama (with the model of your choice), and construct the prompt template. The usage of the is to mostly maintain the separation of user contexts and histories, which just for the purposes of running a quick demo, is not strictly required. cl.user_session Chain is a Langchain interface called that is used to create custom chains. You can read more about that . Runnable here @cl.on_chat_start async def on_chat_start(): .... .... model = Ollama(model="mistral") prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a knowledgeable historian who answers super concisely", ), ("human", "{question}"), ] ) chain = prompt | model cl.user_session.set("chain", chain) Now, you have all the pieces to have a chatbot UI and accept user inputs. What do you do with the prompts the user provides? You will use the handler from Chainlit to do something with the message the user provided. @cl.on_message @cl.on_message async def on_message(message: cl.Message): chain = cl.user_session.get("chain") msg = cl.Message(content="") async for chunk in chain.astream( {"question": message.content}, ): await msg.stream_token(chunk) await msg.send() as the docs suggest “stream back chunks of the response async” which is what we want for our bot. chain.astream That is really it. A few imports, a couple of functions, a little bit of sugar, and you have a functional chatbot. For the full code, you can see my . GitHub 👏 If this content is interesting to you, hit that button or subscribe to my newsletter here → . It gives me the feedback that I need to do more or less of something! Thanks ❤️ https://a1engineering.beehiiv.com/subscribe Also published here