Beep Beep Bop Bop: How to Deploy Multiple AI Agents Using Local LLMs

Deploying multiple local Ai agents using local LLMs like Llama2 and Mistral-7b.

“Never Send A Human To Do A Machine’s Job”

— Agent Smith

Are you searching for a way to build a whole army of organized ai agents with Autogen using local LLMs instead of the paid OpenAi? Then you came to the right place!

Chat LLMs are cool, but giving taking action as an intelligent agent is next level. What about many of them? Meet Microsoft’s latest Autogen project.

But there is a problem. Autogen was built to be hooked to OpenAi by default, wich is limiting, expensive and censored/non-sentient. That’s why using a simple LLM locally like Mistral-7B is the best way to go. You can also use with any other model of your choice such as Llama2, Falcon, Vicuna, Alpaca, the sky (your hardware) is really the limit.

The secret is to use openai JSON style of output in your local LLM server such as Oobabooga’s text-generation-webui, then hook it to autogen. That’s what we’re building today.

Note there are other methods for making llms spit text in openai apis format as well like the llama.cpp python bindings.

In this tutorial we will be: 0. Getting Oobabooga’s text-generation-webui, an LLM (Mistral-7b) and Autogen

Setting up OpenAi format extension on Oobabooga
Starting the local LLM server with the OpenAi format
Hooking it to Autogen

Let’s get started!

0. Getting Oobabooga’s Text-Generation-Webui, an LLM (Mistral-7b) and Autogen

Before proceeding, it’s recommended to use a virtual environment when installing pip packages. Make a new one and activate it if you feel like.

Getting Obbabooga’s Text Generation Webui: This is a well known program for hosting LLMs in your local machine. Head to text-generation-webui’s page and follow the installation guide. It is very straight forward to get started. You might also want to download CUDA if you are using an NVIDIA gpu for acceleration.

Getting an LLM (Mistral-7b-Instruct): After downloading the text generation webui, don’t start it just yet. We need to get a LLM to give life to our agents.

Today we’ll be exploring Mistral-7B, specifically Mistral-7B-instruct-v0.1.Q4_K_S.gguf, an optimized version of the model by TheBloke. You can choose the optimized model perfect for your machine based on the explanation in the description.

You can choose smaller or bigger models depending on your hardware. Don’t be too scared to try out things in your computer tho, we’re making science here.

Head to the Files and Versions page, and grab the following:

config.json
Mistral-7B-instruct-v0.1.Q4_K_S.gguf (will run well in most mid setups)

Once downloaded, head to the text-generation-webui installation folder, and inside it open the models folder. In there, create a new folder with the name of your model (or any name you want), like “mistral-7b-instruct”. The path will be like this:

C:/.../text-generation-webui/models/mistral-7b-instruct

Place both the config.json file and the model.gguf in the new folder.

Getting Autogen:
To install Microsoft’s multi-agent making python library just install it using the pip package installer in your terminal.

pip install pyautogen

1. Setting Up OpenAi Format Extension on Oobabooga

With your brand new text-generation-webui installed and the LLM downloaded, we can proceed on making your local Oobabooga server speak in OpenAi JSON format. You can learn more about OpenAi APIs formats and features it in their documentation.

To hook Autogen with our local server, we will need to activate the “openai” extension in the Ooobaboga’s text-generation-webui extensions folder.

In your terminal head to “text-generation-webui/extensions/openai” folder and in there install its requirements:

pip install -r requirements.txt

2. Starting The Local LLM Server In OpenAi Format

Now head back to the /text-generation-webui root folder in your terminal. Its time to get this baby up and running.

As the name says, it was meant to be used as a webui, but you can also just keep it running as a server to query apis from other programs you make.

To boot it as a local server and with the openai api extension, use the following command according to your current OS.

Don’t forget to change the “model” parameter to the folder name we created earlier at /models. (In my case I named the folder **“**mistral-7b-instruct”)

Windows:

./start_windows.bat --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

Linux:

./start_linux.sh --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

MacOS:

./start_macos.sh --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

We pass the extensions openai parameter to load the extension, listen to start a server we can query from autogen, loader and model wich specify the loader for the model and the model folder name we created earlier, with the config.json and the model.gguf files.

If everything goes right, you might see something like this:

The webui is running on your localhost port 7860 as an usual start, but note our OpenAI compatible api is also ready to be used by Autogen at our local host at http://127.0.0.1:5001/v1.

3. Hooking it to Autogen

At this point, you already have the autogen lib installed, so it’s time to import it and plug our LLM server.

Let’s start with something simple, a single agent interacting with a human (you). Create a new directory wherever you like, and add a new autogen.py file there. You can also rename the file as you wish.

Generally to simply connect to OpenAi GPT’s API you would start the file like this:

import autogen #start importing the autogen lib

config_list = [
    {
        'model': 'gpt-3.5-turbo',
        'api_key': 'your openai real key here'

    }
]

But to use our running local server, we initiate it like this:

import autogen #start importing the autogen lib

config_list = [
    {
        "model": "mistral-instruct-7b", #the name of your running model
        "api_base": "http://127.0.0.1:5001/v1", #the local address of the api
        "api_type": "open_ai",
        "api_key": "sk-111111111111111111111111111111111111111111111111", # just a placeholder
    }
]

As you don’t need a real key for working locally, we are just using the sk-1111… placeholder.

Next, we can setup the agent and the human user. Read the comments for better understanding.

import autogen #start importing the autogen lib

config_list = [
    {
        "model": "mistral-instruct-7b", #the name of your running model
        "api_base": "http://127.0.0.1:5001/v1", #the local address of the api
        "api_type": "open_ai",
        "api_key": "sk-111111111111111111111111111111111111111111111111", # just a placeholder
    }
]

# create an ai AssistantAgent named "assistant"
assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config={
        "seed": 42,  # seed for caching and reproducibility
        "config_list": config_list,  # a list of OpenAI API configurations
        "temperature": 0,  # temperature for sampling
        "request_timeout": 400, # timeout
    },  # configuration for autogen's enhanced inference API which is compatible with OpenAI API
)

# create a human UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10, 
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "agents-workspace", # set the working directory for the agents to create files and execute
        "use_docker": False,  # set to True or image name like "python:3" to use docker
    },
)

# the assistant receives a message from the user_proxy, which contains the task description
user_proxy.initiate_chat(
    assistant,
    message="""Create a posting schedule with captions in instagram for a week and store it in a .csv file.""",
)

Remember to change message=”…” with your initial orders.

If you just run the script with the message, you may see a new directory called “agents-workspace” with a .csv file in there, created “manually” by the agent.

Now let’s go for something a bit more advanced.
Multiple agents with roles and contexts.

This will work like a “chat group” like any messaging app you know. Their contexts (system message) will tell them how to behave, and wich hierarchy they should obey. This time we will have:

Two humans: the admin and the executor.
Four agents: the engineer, the scientist, the planner and the critic.

import autogen

#Use the local LLM server same as before
config_list = [
    {
        "model": "mistral-instruct-7b", #the name of your running model
        "api_base": "http://127.0.0.1:5001/v1", #the local address of the api
        "api_type": "open_ai",
        "api_key": "sk-111111111111111111111111111111111111111111111111", # just a placeholder
    }
]

# set a "universal" config for the agents
agent_config = {
    "seed": 42,  # change the seed for different trials
    "temperature": 0,
    "config_list": config_list,
    "request_timeout": 120,
}

# humans
user_proxy = autogen.UserProxyAgent(
   name="Admin",
   system_message="A human admin. Interact with the planner to discuss the plan. Plan execution needs to be approved by this admin.",
   code_execution_config=False,
)

executor = autogen.UserProxyAgent(
    name="Executor",
    system_message="Executor. Execute the code written by the engineer and report the result.",
    human_input_mode="NEVER",
    code_execution_config={"last_n_messages": 3, "work_dir": "paper"},
)

# agents
engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config=agent_config,
    system_message='''Engineer. You follow an approved plan. You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type. The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor.
Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
''',
)

scientist = autogen.AssistantAgent(
    name="Scientist",
    llm_config=agent_config,
    system_message="""Scientist. You follow an approved plan. You are able to categorize papers after seeing their abstracts printed. You don't write code."""
)

planner = autogen.AssistantAgent(
    name="Planner",
    system_message='''Planner. Suggest a plan. Revise the plan based on feedback from admin and critic, until admin approval.
The plan may involve an engineer who can write code and a scientist who doesn't write code.
Explain the plan first. Be clear which step is performed by an engineer, and which step is performed by a scientist.
''',
    llm_config=agent_config,
)

critic = autogen.AssistantAgent(
    name="Critic",
    system_message="Critic. Double check plan, claims, code from other agents and provide feedback. Check whether the plan includes adding verifiable info such as source URL.",
    llm_config=agent_config,
)

# start the "group chat" between agents and humans
groupchat = autogen.GroupChat(agents=[user_proxy, engineer, scientist, planner, executor, critic], messages=[], max_round=50)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=agent_config)

# Start the Chat!
user_proxy.initiate_chat(
    manager,
    message="""
find papers on LLM applications from arxiv in the last week, create a markdown table of different domains.
""",
)

# to followup of the previous question, use:
# user_proxy.send(
#     recipient=assistant,
#     message="""your followup response here""",
# )

There you go, you have your new army of agents.

I strongly recommend going deeper in the Autogen documentation to understand what else this kind of agency automation is able to do.

Also, after understanding how autogen works under the hood you may want to use it via a interface like autogen-ui, or maybe create your own in your company’s dashboard.

Now it’s up to you. Orchestrate agents untied from OpenAi limitations, to build a better future for us humans. Always remember that with great power comes great responsibility. So, what are you building next?

This post was completely written by a human™

Also published here.

Beep Beep Bop Bop: How to Deploy Multiple AI Agents Using Local LLMs

Too Long; Didn't Read

People Mentioned

Deploying multiple local Ai agents using local LLMs like Llama2 and Mistral-7b.

Are you searching for a way to build a whole army of organized ai agents with Autogen using local LLMs instead of the paid OpenAi? Then you came to the right place!

0. Getting Oobabooga’s Text-Generation-Webui, an LLM (Mistral-7b) and Autogen

1. Setting Up OpenAi Format Extension on Oobabooga

2. Starting The Local LLM Server In OpenAi Format

3. Hooking it to Autogen

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

Beep Beep Bop Bop: How to Deploy Multiple AI Agents Using Local LLMs

Too Long; Didn't Read

People Mentioned

Deploying multiple local Ai agents using local LLMs like Llama2 and Mistral-7b.

Are you searching for a way to build a whole army of organized ai agents with Autogen using local LLMs instead of the paid OpenAi? Then you came to the right place!

0. Getting Oobabooga’s Text-Generation-Webui, an LLM (Mistral-7b) and Autogen

1. Setting Up OpenAi Format Extension on Oobabooga

2. Starting The Local LLM Server In OpenAi Format

3. Hooking it to Autogen

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps