11,762 讀數

Beep Beep Bop Bop：如何使用本地 LLM 部署多个 AI 代理

经过 Baby Commando10m2023/10/19

太長; 讀書

使用 Llama2 和 Mistral-7b 等本地 LLM 部署多个本地 Ai 代理。

featured image - Beep Beep Bop Bop：如何使用本地 LLM 部署多个 AI 代理

使用 Llama2 和 Mistral-7b 等本地 LLM 部署多个本地 Ai 代理。

“永远不要派人来做机器的工作”
— 史密斯特工

您是否正在寻找一种方法，使用本地法学硕士而不是付费 OpenAi 来通过 Autogen 建立一支由有组织的人工智能代理组成的大军？那么您来对地方了！

聊天法学硕士很酷，但作为智能代理采取行动是下一个层次。他们中的许多人又怎样呢？了解 Microsoft 最新的Autogen项目。

但有一个问题。 Autogen 默认情况下是为了与 OpenAi 挂钩而构建的，这是有限制的、昂贵的且经过审查/无感知的。这就是为什么在本地使用像Mistral-7B这样的简单法学硕士是最好的方法。您还可以使用您选择的任何其他模型，例如Llama2 、 Falcon 、 Vicuna 、 Alpaca ，天空（您的硬件）确实是极限。

秘诀是在本地 LLM 服务器中使用 openai JSON 样式的输出，例如 Oobabooga 的 text-generation-webui，然后将其挂接到 autogen。这就是我们今天正在构建的。

请注意，还有其他方法可以以 openai apis 格式制作 llms 吐出文本，以及 llama.cpp python 绑定。

在本教程中，我们将： 0. 获取 Oobabooga 的文本生成 webui、LLM (Mistral-7b) 和 Autogen

在 Oobabooga 上设置 OpenAi 格式扩展
使用 OpenAi 格式启动本地 LLM 服务器
将其连接到 Autogen

让我们开始吧！

0. 获得 Oobabooga 的 Text-Generation-Webui、LLM (Mistral-7b) 和 Autogen

在继续之前，建议在安装 pip 软件包时使用虚拟环境。如果您愿意的话，制作一个新的并激活它。

获取 Obbabooga 的文本生成 Webui：这是一个众所周知的程序，用于在本地计算机上托管 LLM。前往text- Generation-webui页面并按照安装指南进行操作。上手非常简单。如果您使用 NVIDIA GPU 进行加速，您可能还需要下载CUDA 。

获得 LLM (Mistral-7b-Instruct)：下载文本生成 WebUI 后，先不要启动它。我们需要获得法学硕士学位才能为我们的代理人注入活力。

今天我们将探索Mistral-7B ，特别是Mistral-7B-instruct-v0.1.Q4_K_S.gguf ，这是 TheBloke 模型的优化版本。您可以根据描述中的说明选择最适合您的机器的优化型号。

您可以根据您的硬件选择更小或更大的型号。不过，不要害怕在计算机上尝试一些东西，我们在这里创造科学。

前往文件和版本页面，并获取以下内容：

配置.json
Mistral-7B-instruct-v0.1.Q4_K_S.gguf （在大多数中间设置中运行良好）

下载后，前往 text- Generation-webui 安装文件夹，然后在其中打开 models 文件夹。在其中，使用模型名称（或任何您想要的名称）创建一个新文件夹，例如“mistral-7b-instruct” 。路径将是这样的：

 C:/.../text-generation-webui/models/mistral-7b-instruct

将config.json文件和model.gguf放入新文件夹中。

获取 Autogen ：
要安装 Microsoft 的多代理 Python 库，只需在终端中使用 pip 包安装程序进行安装即可。

 pip install pyautogen

1. 在 Oobabooga 上设置 OpenAi 格式扩展

安装全新的文本生成 webui 并下载 LLM 后，我们可以继续让您的本地 Oobabooga 服务器以 OpenAi JSON 格式说话。您可以了解有关 OpenAi API 格式的更多信息，并在其文档。

要将 Autogen 与我们的本地服务器挂钩，我们需要激活 Ooobaboga 的 text- Generation-webui 扩展文件夹中的“openai”扩展。

在终端中前往“text- Generation-webui/extensions/openai”文件夹并在那里安装其要求：

 pip install -r requirements.txt

2.以OpenAi格式启动本地LLM服务器

现在返回终端中的/text- Generation-webui根文件夹。是时候让这个婴儿启动并运行了。

顾名思义，它旨在用作 WebUI，但您也可以将其作为服务器运行，以从您制作的其他程序查询 api。

要将其作为本地服务器启动并使用 openai api 扩展，请根据您当前的操作系统使用以下命令。

不要忘记将“model”参数更改为我们之前在 /models 创建的文件夹名称。 （在我的例子中，我将文件夹命名为 **“**mistral-7b-instruct”）

视窗：

 ./start_windows.bat --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

Linux：

 ./start_linux.sh --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

苹果系统：

 ./start_macos.sh --extensions openai --listen --loader llama.cpp --model mistral-7b-instruct

我们传递扩展 openai参数来加载扩展，侦听启动服务器，我们可以从 autogen、 loader和model查询，它们指定模型的加载器以及我们之前创建的模型文件夹名称，以及 config.json 和模型。 gguf 文件。

如果一切顺利，您可能会看到如下内容：

像往常一样，webui 在您的本地主机端口 7860 上运行，但请注意，我们的 OpenAI 兼容 api 也已准备好供 Autogen 在我们的本地主机上使用： http://127.0.0.1:5001/v1 。

3. 将其连接到 Autogen

此时，您已经安装了 autogen lib，因此是时候导入它并插入我们的 LLM 服务器了。

让我们从一些简单的事情开始，一个代理与人类（你）交互。在任意位置创建一个新目录，并在其中添加一个新的autogen.py文件。您还可以根据需要重命名该文件。

一般来说，为了简单地连接到 OpenAi GPT 的 API，您可以像这样启动文件：

 import autogen #start importing the autogen lib config_list = [ { 'model': 'gpt-3.5-turbo', 'api_key': 'your openai real key here' } ]

但是要使用正在运行的本地服务器，我们可以像这样启动它：

 import autogen #start importing the autogen lib config_list = [ { "model": "mistral-instruct-7b", #the name of your running model "api_base": "http://127.0.0.1:5001/v1", #the local address of the api "api_type": "open_ai", "api_key": "sk-111111111111111111111111111111111111111111111111", # just a placeholder } ]

由于您不需要真正的密钥即可在本地工作，因此我们仅使用 sk-1111… 占位符。

接下来，我们可以设置代理和人类用户。阅读评论以获得更好的理解。

 import autogen #start importing the autogen lib config_list = [ { "model": "mistral-instruct-7b", #the name of your running model "api_base": "http://127.0.0.1:5001/v1", #the local address of the api "api_type": "open_ai", "api_key": "sk-111111111111111111111111111111111111111111111111", # just a placeholder } ] # create an ai AssistantAgent named "assistant" assistant = autogen.AssistantAgent( name="assistant", llm_config={ "seed": 42, # seed for caching and reproducibility "config_list": config_list, # a list of OpenAI API configurations "temperature": 0, # temperature for sampling "request_timeout": 400, # timeout }, # configuration for autogen's enhanced inference API which is compatible with OpenAI API ) # create a human UserProxyAgent instance named "user_proxy" user_proxy = autogen.UserProxyAgent( name="user_proxy", human_input_mode="NEVER", max_consecutive_auto_reply=10, is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"), code_execution_config={ "work_dir": "agents-workspace", # set the working directory for the agents to create files and execute "use_docker": False, # set to True or image name like "python:3" to use docker }, ) # the assistant receives a message from the user_proxy, which contains the task description user_proxy.initiate_chat( assistant, message="""Create a posting schedule with captions in instagram for a week and store it in a .csv file.""", )

请记住将 message=”...” 更改为您的初始订单。

如果您只是运行包含该消息的脚本，您可能会看到一个名为“agents-workspace”的新目录，其中包含由代理“手动”创建的 .csv 文件。

现在让我们来做一些更高级的事情。
具有角色和上下文的多个代理。

这将像您所知道的任何消息应用程序一样像“聊天组”一样工作。他们的上下文（系统消息）将告诉他们如何行为，以及他们应该遵守哪个层次结构。这次我们将有：

两个人：管理员和执行者。
四个代理人：工程师、科学家、规划者和评论家。

 import autogen #Use the local LLM server same as before config_list = [ { "model": "mistral-instruct-7b", #the name of your running model "api_base": "http://127.0.0.1:5001/v1", #the local address of the api "api_type": "open_ai", "api_key": "sk-111111111111111111111111111111111111111111111111", # just a placeholder } ] # set a "universal" config for the agents agent_config = { "seed": 42, # change the seed for different trials "temperature": 0, "config_list": config_list, "request_timeout": 120, } # humans user_proxy = autogen.UserProxyAgent( name="Admin", system_message="A human admin. Interact with the planner to discuss the plan. Plan execution needs to be approved by this admin.", code_execution_config=False, ) executor = autogen.UserProxyAgent( name="Executor", system_message="Executor. Execute the code written by the engineer and report the result.", human_input_mode="NEVER", code_execution_config={"last_n_messages": 3, "work_dir": "paper"}, ) # agents engineer = autogen.AssistantAgent( name="Engineer", llm_config=agent_config, system_message='''Engineer. You follow an approved plan. You write python/shell code to solve tasks. Wrap the code in a code block that specifies the script type. The user can't modify your code. So do not suggest incomplete code which requires others to modify. Don't use a code block if it's not intended to be executed by the executor. Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. Check the execution result returned by the executor. If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try. ''', ) scientist = autogen.AssistantAgent( name="Scientist", llm_config=agent_config, system_message="""Scientist. You follow an approved plan. You are able to categorize papers after seeing their abstracts printed. You don't write code.""" ) planner = autogen.AssistantAgent( name="Planner", system_message='''Planner. Suggest a plan. Revise the plan based on feedback from admin and critic, until admin approval. The plan may involve an engineer who can write code and a scientist who doesn't write code. Explain the plan first. Be clear which step is performed by an engineer, and which step is performed by a scientist. ''', llm_config=agent_config, ) critic = autogen.AssistantAgent( name="Critic", system_message="Critic. Double check plan, claims, code from other agents and provide feedback. Check whether the plan includes adding verifiable info such as source URL.", llm_config=agent_config, ) # start the "group chat" between agents and humans groupchat = autogen.GroupChat(agents=[user_proxy, engineer, scientist, planner, executor, critic], messages=[], max_round=50) manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=agent_config) # Start the Chat! user_proxy.initiate_chat( manager, message=""" find papers on LLM applications from arxiv in the last week, create a markdown table of different domains. """, ) # to followup of the previous question, use: # user_proxy.send( # recipient=assistant, # message="""your followup response here""", # )

好了，你有了新的特工军队。