Contents
- Overview of agents, tools, and executors
- How they are used in practice
- Detailed walkthrough of the mechanics that allow agents to work autonomously in the real world
- How I used AI to generate this article
An overview of agents and executors
Agents
Agents are classes that leverage on language models and are responsible for performing tasks, answering questions, or solving problems using various tools. They are often used to interact with data sources to assist with problem-solving.
Executors
An executor on the other hand, is a running instance of an agent and is used to execute tasks based on the agent’s decisions and configured tools.
Tools
Tools are instances of classes that perform specific tasks or provide specific utilities. These classes are derived from a base class called Tool
Examples of tools include:
SerpAPI
: It utilizes theSerpAPI
services for search engine result data.ReadFileTool
: Provides file reading capabilities.WriteFileTool
: Provides file writing capabilities.
The agents
decide which tools to use, and the executors
executes them.
Let’s start with an example
We will run through a quick workflow of how we actually use agents and executors in practice.
Initialisation
When initialising an AgentExecutor
, it accepts an AgentExecutorInput
, which contains the agent, tools, maximum iterations, and optional early stopping method. The AgentExecutor constructor sets the following properties:
agent
tools
returnIntermediateSteps
maxIterations
(default is 15)earlyStoppingMethod
(default isforce
)
You can create an AgentExecutor
using AgentExecutor.fromAgentAndTools
and providing the required input fields. Here’s a working example from the Langchain repository:
import { AgentExecutor, ZeroShotAgent } from "langchain/agents";
import { OpenAI } from "langchain/llms/openai";
import { SerpAPI } from "langchain/tools";
import { Calculator } from "langchain/tools/calculator";
export const run = async () => {
const model = new OpenAI({ temperature: 0 });const tools = [new SerpAPI(process.env.SERPAPI_API_KEY, {location: "Austin,Texas,United States",hl: "en",gl: "us",}),new Calculator(),];
const agent = new ZeroShotAgent({ allowedTools: ["search", "calculator"] });const agentExecutor = AgentExecutor.fromAgentAndTools({ agent, tools });
console.log("Loaded agent.");
const input = `Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?`;
console.log(`Executing with input "${input}"...`);
const result = await agentExecutor.call({ input });
console.log(`Got output ${result.output}`);
};
In the code above, we are:
- Using the
OpenAI
as the language model - Creating a list of tools, mainly the
SerpAPI
,Calculator
- Instantiating a type of agent known as the
ZeroShotAgent
(there’s many types of agents with different use-cases, or you can create your own!) - Instantiating an
AgentExecutor
with the agent and tools - Using the
.call
method on the agent executor to receive the computed output from the agent
The execution process of an agent executor
When you call the .call()
method of the AgentExecutor
, it triggers the execution process. This is actually just a loop to perform the following actions:
- The Agent creates a plan (It’s just a text output by the language model) using the previous steps and inputs in the prompt.
- The agent parses the text output, and produces either an
AgentFinish
orAgentAction
. For instance, if the parsed text contains a prefix declared in the variableFINAL_ANSWER_ACTION
(By default it’sFinal Answer:
) the agent returns an instance ofAgentFinish
. This enables theAgentExecutor
to interpret the decision made by the language model and carry out actual executions of tools. - If the executor receives the
AgentFinish
object, the execution loop will be terminated, and the output will be returned using thegetOutput
function, which computes the final output based on the agent's finish step, intermediate steps, and additional data from the agent. - If the executor receives the
AgentAction
object, it will process the actions returned by the agent plan, calling the corresponding tools for each action and generating observations. - The action and observation in each step will be added to the execution steps array.
- The loop will continue until the maximum number of iterations is reached or
shouldContinue
function determines to stop. - If the maximum number of iterations is reached or early stopping is triggered, the agent will return a stopped response, and the output will be returned using the
getOutput
function.
This looping process of planning, parsing, and executing tools, enables the agents to leverage on the decision making power of language models to build autonomous entities that can carry out more complicated tasks.
How I made this technical blog post using AI
Interestingly, this article was ~80% generated using a tool i built called Genie (with langchain). The goal for Genie is to help developers understand complex code implementations in minutes instead of hours. Do give it a go if you’re interested! www.birdlabs.ai (Check it out if you’re interested)
Here are some of the questions I asked in order to understand the mechanics behind autonomous agents.
-
Getting a brief overview of the Agents and executors
2. Realising I need more information on what tools are
3. Diving deeper into the implementation of an agent executor
4. Realising I need to know how the Agent
interprets the responses from the language model
That’s all! Follow me for more coding tutorials!
I’ll be dropping more tutorials that uses Genie! Follow my blog and let me know what you want to learn about next :)
Let’s keep in touch!
Linkedin: https://www.linkedin.com/in/dion-neo-470a161a6/
Email: dion@birdlabs.ai
Twitter: @neowenshun