This is the first part in a multi-part series on building Agents with OpenAI's Assistant API using the Python SDK. What Are Agents? The way I like to look at it, an agent is really just a piece of software leveraging an LLM (Large Language Model) and trying to mimic human behavior. That means it can not only converse and understand language, but it can also perform actions that have an impact on the real world. These actions are typically called tools. In this blog post, we will explore how to build an agent using OpenAI's Assistant API using their Python SDK. Part 1 will be just the skeleton of the assistant. That is, just the conversational part. I chose to build a CLI app on purpose to be framework agnostic. We will purposefully call our implementation an Agent and refer to the OpenAI SDK implementation as an Assistant to easily distinguish between the two. I use the terms and interchangeably when it comes to functions that the Agent is able to call. Part 2 will cover function callin in more detail. tools functions Prerequisites To follow along with this tutorial, you will need the following: Python3 installed on your machine An OpenAI API key Basic knowledge of Python programming OpenAI Assistant Concepts : An Assistant in the Assistants API is an entity configured to respond to user messages. It uses instructions, a chosen model, and tools to interact with functions and provide answers. Assistant : A Thread represents a conversation or dialogue in the Assistants API. It is created for each user interaction and can contain multiple Messages, serving as a container for the ongoing conversation. Thread : A Message is a unit of communication in a Thread. It contains text (and potentially files in the future) and is used to convey user queries or assistant responses within a Thread. Message : A Run is an instance of the Assistant processing a Thread. It involves reading the Thread, deciding whether to call tools, and generating responses based on the model's interpretation of the Thread's Messages. Run Setting Up the Development Environment The first step is to create a virtual environment using and activate it. This will ensure that our dependencies are isolated from the system Python installation: venv python3 -m venv venv source venv/bin/activate Let's install our only dependency: the package: openai pip install openai Create a file. Let's populate with some basic runtime logic for our CLI app: main.py while True: user_input = input("User: ") if user_input.lower() == 'exit': print("Exiting the assistant...") break print(f"Assistant: You said {user_input}") Try it out by running : python3 main.py python3 main.py User: hi Assistant: You said hi As you can see, the CLI accepts a User message as input, and our genius Assistant doesn't have a brain 🧠 yet so he just repeats the message right back. Not so smart yet. The Agent Now, the fun 😁 (or headaches 🤕) begins. I'll provide all the imports needed for the final class right now, so you don't rack your brain on where things are coming from since I kept imports out of code samples for brevity. Let's start by building an class in a new file : Agent agent.py import time import openai from openai.types.beta.threads.run import Run class Agent: def __init__(self, name: str, personality: str): self.name = name self.personality = personality self.client = openai.OpenAI(api_key="sk-*****") self.assistant = self.client.beta.assistants.create( name=self.name, model="gpt-4-turbo-preview" ) In the class constructor, we initialize the OpenAI client as a class property by passing our OpenAI API key. Next, we create an class property that maps to our newly created Assistant. We store and as class properties for later use. assistant name personality The argument we are passing to the create method is just for identifying the Assistant in the OpenAI dashboard, and the AI is not actually aware of it at this point. You actually have to pass the name to the which we will see later. name instructions You could already set when creating the Assistant, but it will actually make your Assistant less flexible to dynamic changes. instructions You can update an Assistant by calling , but there is a better place to pass in dynamic values that we will see when we get to Runs. client.beta.assistants.update Note that if you pass here and then again when creating a Run, the Assistant's will be overwritten by the of the run. They do not complement each other, so choose one based on your needs: Assistant level for static instructions or Run level for dynamic instructions. instructions instructions instructions For the model, I chose the model so that we can add function calling in part 2 of this series. You could use if you want to save a few fractions of a penny while giving yourself a migraine of pure frustration down the line when we implement tools. gpt-4-turbo-preview gpt-3.5-turbo GPT 3.5 is terrible at calling tools; the hours I've lost trying to deal with it allow me to say that. 😝 I'll leave it at that, and more on this later. Creating a Thread, Adding Messages, and Retrieving the Last Message After we create an agent, we will need to start a conversation thread. class Agent: # ... (rest of code) def create_thread(self): self.thread = self.client.beta.threads.create() And we will want a way to add messages to that thread: class Agent: # ... (rest of code) def add_message(self, message): self.client.beta.threads.messages.create( thread_id=self.thread.id, role="user", content=message ) Note that at the moment, it is only possible to add messages with the role . I believe OpenAI plans on changing this in a future release as this is pretty limiting. user Now, we can get the last message in the thread: class Agent: # ... (rest of code) def get_last_message(self): return self.client.beta.threads.messages.list( thread_id=self.thread.id ).data[0].content[0].text.value Next, we create an entry point method to test out what we have so far. Currently, the method just returns the last message in the thread. It doesn't actually perform a Run. It's still brainless. run_agent run_agent class Agent: # ... (rest of code) def run_agent(self): message = self.get_last_message() return message Back in , we create the agent and our first thread. We add a message to the thread. Then return that same message back to the user, but this time, coming from that live thread. main.py from agent import Agent agent = Agent(name="Bilbo Baggins", personality="You are the accomplished and renowned adventurer from The Hobbit. You act like you are a bit of a homebody, but you are always up for an adventure. You worry a bit too much about breakfast.") agent.create_thread() while True: user_input = input("User: ") if user_input.lower() == 'exit': print("Exiting the agent...") break agent.add_message(user_input) answer = agent.run_agent() print(f"Assistant: {answer}") Let's run it: python3 main.py User: hi Assistant: hi Still not very smart. Closer to a parrot 🦜 than a hobbit. In the next section, the real fun begins. Creating and Polling a Run When you create a run, you need to periodically retrieve the object to check the status of the run. This is called polling, and it sucks. You need to poll in order to determine what your agent should do next. OpenAI plans to add support for streaming to make this simpler. In the meantime, I will show you how to set up polling in this next section. Run Note the on the following method names which is the standard in Python for indicating that the method is intended for internal use and should not be accessed directly by external code. _ First, let's create a helper method for creating a , and update to call this method: _create_run Run run_agent class Agent: # ... (rest of code) def get_breakfast_count_from_db(self): return 1 def _create_run(self): count = self.get_breakfast_count_from_db() return self.client.beta.threads.runs.create( thread_id=self.thread.id, assistant_id=self.assistant.id, instructions=f""" Your name is: {self.name} Your personality is: {self.personality} Metadata related to this conversation: {{ "breakfast_count": {count} }} """, ) def run_agent(self): run = self._create_run() # add this line message = self.get_last_message() return message Notice how we pass the and to create a run. thread.id assistant.id Remember how I said at the beginning that there was a better place to pass in dynamic instructions and data? That would be the parameter when creating the Run. In our case, we could have the breakfast be fetched from a database. This will allow you to easily pass in different relevant dynamic data every time you want to trigger an answer. instructions count Now, your agent is aware of the world changing around it and can act accordingly. I like to have a metadata JSON object in my instructions that keeps relevant dynamic context. This allows me to pass in data while being less verbose and in a format that the LLM understands really well. Don't run this yet; it won't work because we aren't waiting for the run to complete when we are getting the last message, so it will still be the last user message. Let's solve this by building out our polling mechanism. First, we will need a way to repeatedly and easily retrieve a run, so let's add a method: _retrieve_run class Agent: # ... (rest of code) def _retrieve_run(self, run: Run): return self.client.beta.threads.runs.retrieve( run_id=run.id, thread_id=self.thread.id) Notice how we need to pass both the and to find a specific run. run.id thread.id Add a method to our Agent class: _poll_run class Agent: # ... (rest of code) def _cancel_run(self, run: Run): self.client.beta.threads.runs.cancel( run_id=run.id, thread_id=self.thread.id) def _poll_run(self, run: Run): status = run.status start_time = time.time() while status != "completed": if status == 'failed': raise Exception(f"Run failed with error: {run.last_error}") if status == 'expired': raise Exception("Run expired.") time.sleep(1) run = self._retrieve_run(run) status = run.status elapsed_time = time.time() - start_time if elapsed_time > 120: # 2 minutes self._cancel_run(run) raise Exception("Run took longer than 2 minutes.") 🥵 Phew, that's a lot... Let's unpack it. receives a object as an argument and extracts the current Run . All the available statuses can be found in the OpenAI . We'll just use a few that suit our current purpose. _poll_run Run status docs We now run a while loop to check for a completed status while handling a few error scenarios. The actual billing of the Assistant API is a bit murky, so to be on the safe side, I opted to cancel my runs after 2 minutes. Even though there is an status for when OpenAI cancels runs after 10 minutes. If a run takes more than 2 minutes, you probably have a problem anyway. expired Since I also don't want to poll every few milliseconds, I throttle my request by only polling every 1 second until I hit the 2-minute mark and cancel my run. You can adjust this to whatever you see fit. Each iteration after the delay, we fetch the Run status again. Now, let's plug all that into our method. You will notice we first create the run with then we poll with until we get an answer or an error is thrown, and finally when the polling is finished, we retrieve the last message from the thread which will now be from the agent. run_agent _create_run _poll_run We then return the message to our runtime loop, so it can be sent back to the user. class Agent: # ... (rest of code) def run_agent(self): run = self._create_run() self._poll_run(run) # add this line message = self.get_last_message() return message Voilà, now when you run your agent again, you will get a reply from our friendly Agent: python3 main.py User: hi Assistant: Hello there! What adventure can we embark on today? Or perhaps, before we set out, we should think about breakfast. Have you had yours yet? I've had mine, of course – can't start the day without a proper breakfast, you know. User: how many breakfasts have you had? Assistant: Ah, well, I've had just 1 breakfast today. But the day is still young, and there's always room for a second, isn't there? What about you? How can I assist you on this fine day? In part 2, we will add the ability for our Agent to call tools. You can find the full code on my . GitHub Thank you for your reading. Happy to hear any thoughts and feedback in the comments. Follow me on Linkedin for more content like this: https://www.linkedin.com/in/jean-marie-dalmasso-1b5473141/