This is the first part in a multi-part series on building Agents with OpenAI's Assistant API using the Python SDK.

What Are Agents?

The way I like to look at it, an agent is really just a piece of software leveraging an LLM (Large Language Model) and trying to mimic human behavior. That means it can not only converse and understand language, but it can also perform actions that have an impact on the real world. These actions are typically called tools.

In this blog post, we will explore how to build an agent using OpenAI's Assistant API using their Python SDK. Part 1 will be just the skeleton of the assistant. That is, just the conversational part.

I chose to build a CLI app on purpose to be framework agnostic. We will purposefully call our implementation an Agent and refer to the OpenAI SDK implementation as an Assistant to easily distinguish between the two.

I use the terms tools and functions interchangeably when it comes to functions that the Agent is able to call. Part 2 will cover function callin in more detail.

Prerequisites

To follow along with this tutorial, you will need the following:

Python3 installed on your machine
An OpenAI API key
Basic knowledge of Python programming

OpenAI Assistant Concepts

Assistant: An Assistant in the Assistants API is an entity configured to respond to user messages. It uses instructions, a chosen model, and tools to interact with functions and provide answers.

Thread: A Thread represents a conversation or dialogue in the Assistants API. It is created for each user interaction and can contain multiple Messages, serving as a container for the ongoing conversation.

Message: A Message is a unit of communication in a Thread. It contains text (and potentially files in the future) and is used to convey user queries or assistant responses within a Thread.

Run: A Run is an instance of the Assistant processing a Thread. It involves reading the Thread, deciding whether to call tools, and generating responses based on the model's interpretation of the Thread's Messages.

Setting Up the Development Environment

The first step is to create a virtual environment using venv and activate it. This will ensure that our dependencies are isolated from the system Python installation:

python3 -m venv venv
source venv/bin/activate

Let's install our only dependency: the openai package:

pip install openai

Create a main.py file. Let's populate with some basic runtime logic for our CLI app:

while True:
    user_input = input("User: ")
    if user_input.lower() == 'exit':
        print("Exiting the assistant...")
        break
    print(f"Assistant: You said {user_input}")

Try it out by running python3 main.py:

python3 main.py
User: hi
Assistant: You said hi

As you can see, the CLI accepts a User message as input, and our genius Assistant doesn't have a brain 🧠 yet so he just repeats the message right back. Not so smart yet.

The Agent

Now, the fun 😁 (or headaches 🤕) begins. I'll provide all the imports needed for the final class right now, so you don't rack your brain on where things are coming from since I kept imports out of code samples for brevity. Let's start by building an Agent class in a new file agent.py:

import time
import openai
from openai.types.beta.threads.run import Run

class Agent:
    def __init__(self, name: str, personality: str):
        self.name = name
        self.personality = personality
        self.client = openai.OpenAI(api_key="sk-*****")
        self.assistant = self.client.beta.assistants.create(
            name=self.name,
            model="gpt-4-turbo-preview"
        )

In the class constructor, we initialize the OpenAI client as a class property by passing our OpenAI API key. Next, we create an assistant class property that maps to our newly created Assistant. We store name and personality as class properties for later use.

The name argument we are passing to the create method is just for identifying the Assistant in the OpenAI dashboard, and the AI is not actually aware of it at this point. You actually have to pass the name to the instructions which we will see later.

You could already set instructions when creating the Assistant, but it will actually make your Assistant less flexible to dynamic changes.

You can update an Assistant by calling client.beta.assistants.update, but there is a better place to pass in dynamic values that we will see when we get to Runs.

Note that if you pass instructions here and then again when creating a Run, the Assistant's instructions will be overwritten by the instructions of the run. They do not complement each other, so choose one based on your needs: Assistant level for static instructions or Run level for dynamic instructions.

For the model, I chose the gpt-4-turbo-preview model so that we can add function calling in part 2 of this series. You could use gpt-3.5-turbo if you want to save a few fractions of a penny while giving yourself a migraine of pure frustration down the line when we implement tools.

GPT 3.5 is terrible at calling tools; the hours I've lost trying to deal with it allow me to say that. 😝 I'll leave it at that, and more on this later.

Creating a Thread, Adding Messages, and Retrieving the Last Message

After we create an agent, we will need to start a conversation thread.

class Agent:
    # ... (rest of code)
    def create_thread(self):
        self.thread = self.client.beta.threads.create()

And we will want a way to add messages to that thread:

class Agent:
    # ... (rest of code)
    def add_message(self, message):
        self.client.beta.threads.messages.create(
            thread_id=self.thread.id,
            role="user",
            content=message
        )

Note that at the moment, it is only possible to add messages with the role user. I believe OpenAI plans on changing this in a future release as this is pretty limiting.

Now, we can get the last message in the thread:

class Agent:
    # ... (rest of code)
    def get_last_message(self):
        return self.client.beta.threads.messages.list(
            thread_id=self.thread.id
        ).data[0].content[0].text.value

Next, we create an entry point run_agent method to test out what we have so far. Currently, the run_agent method just returns the last message in the thread. It doesn't actually perform a Run. It's still brainless.

class Agent:
    # ... (rest of code)
    def run_agent(self):
        message = self.get_last_message()
        return message

Back in main.py, we create the agent and our first thread. We add a message to the thread. Then return that same message back to the user, but this time, coming from that live thread.

from agent import Agent

agent = Agent(name="Bilbo Baggins",
                      personality="You are the accomplished and renowned adventurer from The Hobbit. You act like you are a bit of a homebody, but you are always up for an adventure. You worry a bit too much about breakfast.")

agent.create_thread()

while True:
    user_input = input("User: ")
    if user_input.lower() == 'exit':
        print("Exiting the agent...")
        break
    agent.add_message(user_input)
    answer = agent.run_agent()
    print(f"Assistant: {answer}")

Let's run it:

python3 main.py
User: hi
Assistant: hi

Still not very smart. Closer to a parrot 🦜 than a hobbit. In the next section, the real fun begins.

Creating and Polling a Run

When you create a run, you need to periodically retrieve the Run object to check the status of the run. This is called polling, and it sucks. You need to poll in order to determine what your agent should do next. OpenAI plans to add support for streaming to make this simpler. In the meantime, I will show you how to set up polling in this next section.

Note the _ on the following method names which is the standard in Python for indicating that the method is intended for internal use and should not be accessed directly by external code.

First, let's create a helper method _create_run for creating a Run, and update run_agent to call this method:

class Agent:
    # ... (rest of code)
    def get_breakfast_count_from_db(self):
        return 1

    def _create_run(self):
        count = self.get_breakfast_count_from_db()
        return self.client.beta.threads.runs.create(
            thread_id=self.thread.id,
            assistant_id=self.assistant.id,
            instructions=f"""
                Your name is: {self.name}
                Your personality is: {self.personality}

                Metadata related to this conversation:
                {{
                    "breakfast_count": {count}
                }}
            """,
        )

    def run_agent(self):
        run = self._create_run() # add this line
        message = self.get_last_message()
        return message

Notice how we pass the thread.id and assistant.id to create a run.

Remember how I said at the beginning that there was a better place to pass in dynamic instructions and data? That would be the instructions parameter when creating the Run. In our case, we could have the breakfast count be fetched from a database. This will allow you to easily pass in different relevant dynamic data every time you want to trigger an answer.

Now, your agent is aware of the world changing around it and can act accordingly. I like to have a metadata JSON object in my instructions that keeps relevant dynamic context. This allows me to pass in data while being less verbose and in a format that the LLM understands really well.

Don't run this yet; it won't work because we aren't waiting for the run to complete when we are getting the last message, so it will still be the last user message.

Let's solve this by building out our polling mechanism. First, we will need a way to repeatedly and easily retrieve a run, so let's add a _retrieve_run method:

class Agent:
    # ... (rest of code)
    def _retrieve_run(self, run: Run):
        return self.client.beta.threads.runs.retrieve(
            run_id=run.id, thread_id=self.thread.id)

Notice how we need to pass both the run.id and thread.id to find a specific run.

Add a _poll_run method to our Agent class:

class Agent:
    # ... (rest of code)
    def _cancel_run(self, run: Run):
        self.client.beta.threads.runs.cancel(
            run_id=run.id, thread_id=self.thread.id)

    def _poll_run(self, run: Run):
        status = run.status
        start_time = time.time()
        while status != "completed":
            if status == 'failed':
                raise Exception(f"Run failed with error: {run.last_error}")
            if status == 'expired':
                raise Exception("Run expired.")

            time.sleep(1)
            run = self._retrieve_run(run)
            status = run.status

            elapsed_time = time.time() - start_time
            if elapsed_time > 120:  # 2 minutes
                self._cancel_run(run)
                raise Exception("Run took longer than 2 minutes.")

🥵 Phew, that's a lot... Let's unpack it.

_poll_run receives a Run object as an argument and extracts the current Run status. All the available statuses can be found in the OpenAI docs. We'll just use a few that suit our current purpose.

We now run a while loop to check for a completed status while handling a few error scenarios. The actual billing of the Assistant API is a bit murky, so to be on the safe side, I opted to cancel my runs after 2 minutes.

Even though there is an expired status for when OpenAI cancels runs after 10 minutes. If a run takes more than 2 minutes, you probably have a problem anyway.

Since I also don't want to poll every few milliseconds, I throttle my request by only polling every 1 second until I hit the 2-minute mark and cancel my run. You can adjust this to whatever you see fit.

Each iteration after the delay, we fetch the Run status again.

Now, let's plug all that into our run_agent method. You will notice we first create the run with _create_run then we poll with _poll_run until we get an answer or an error is thrown, and finally when the polling is finished, we retrieve the last message from the thread which will now be from the agent.

We then return the message to our runtime loop, so it can be sent back to the user.

class Agent:
    # ... (rest of code)
    def run_agent(self):
        run = self._create_run()
        self._poll_run(run) # add this line
        message = self.get_last_message()
        return message

Voilà, now when you run your agent again, you will get a reply from our friendly Agent:

python3 main.py
User: hi
Assistant: Hello there! What adventure can we embark on today? Or perhaps, before we set out, we should think about breakfast. Have you had yours yet? I've had mine, of course – can't start the day without a proper breakfast, you know.
User: how many breakfasts have you had?
Assistant: Ah, well, I've had just 1 breakfast today. But the day is still young, and there's always room for a second, isn't there? What about you? How can I assist you on this fine day?

In part 2, we will add the ability for our Agent to call tools.

You can find the full code on my GitHub.

Thank you for your reading. Happy to hear any thoughts and feedback in the comments. Follow me on Linkedin for more content like this: https://www.linkedin.com/in/jean-marie-dalmasso-1b5473141/