1,097 reads

Understanding the Magic Behind Langchain Autonomous Agents

by Dion NeoJune 29th, 2023

Too Long; Didn't Read

This article explains how to use agents and executors in the real world. Agents are classes that leverage on language models and are responsible for performing tasks, answering questions, or solving problems using various tools. Executors are instances of classes that perform specific tasks or provide specific utilities.

featured image - Understanding the Magic Behind Langchain Autonomous Agents

‘autonomous agents’ Image created by HackerNoon AI Image Generator

Overview of agents, tools, and executors
How they are used in practice
Detailed walkthrough of the mechanics that allow agents to work autonomously in the real world
How I used AI to generate this article

An overview of agents and executors

Agents

Agents are classes that leverage on language models and are responsible for performing tasks, answering questions, or solving problems using various tools. They are often used to interact with data sources to assist with problem-solving.

Executors

An executor on the other hand, is a running instance of an agent and is used to execute tasks based on the agent’s decisions and configured tools.

Tools

Tools are instances of classes that perform specific tasks or provide specific utilities. These classes are derived from a base class called Tool

Examples of tools include:

SerpAPI: It utilizes the SerpAPI services for search engine result data.
ReadFileTool: Provides file reading capabilities.
WriteFileTool: Provides file writing capabilities.

The agents decide which tools to use, and the executors executes them.

Let’s start with an example

We will run through a quick workflow of how we actually use agents and executors in practice.

Initialisation

When initialising an AgentExecutor, it accepts an AgentExecutorInput, which contains the agent, tools, maximum iterations, and optional early stopping method. The AgentExecutor constructor sets the following properties:

agent
tools
returnIntermediateSteps
maxIterations (default is 15)
earlyStoppingMethod (default is force)

You can create an AgentExecutor using AgentExecutor.fromAgentAndTools and providing the required input fields. Here’s a working example from the Langchain repository:

import { AgentExecutor, ZeroShotAgent } from "langchain/agents";
import { OpenAI } from "langchain/llms/openai";
import { SerpAPI } from "langchain/tools";
import { Calculator } from "langchain/tools/calculator";

export const run = async () => {
  const model = new OpenAI({ temperature: 0 });const tools = [new SerpAPI(process.env.SERPAPI_API_KEY, {location: "Austin,Texas,United States",hl: "en",gl: "us",}),new Calculator(),];
  const agent = new ZeroShotAgent({ allowedTools: ["search", "calculator"] });const agentExecutor = AgentExecutor.fromAgentAndTools({ agent, tools });
  console.log("Loaded agent.");
  
  const input = `Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?`;
  console.log(`Executing with input "${input}"...`);
  
  const result = await agentExecutor.call({ input });
  console.log(`Got output ${result.output}`);
};

In the code above, we are:

Using the OpenAI as the language model
Creating a list of tools, mainly the SerpAPI, Calculator
Instantiating a type of agent known as the ZeroShotAgent (there’s many types of agents with different use-cases, or you can create your own!)
Instantiating an AgentExecutor with the agent and tools
Using the .call method on the agent executor to receive the computed output from the agent

The execution process of an agent executor

When you call the .call() method of the AgentExecutor, it triggers the execution process. This is actually just a loop to perform the following actions:

The Agent creates a plan (It’s just a text output by the language model) using the previous steps and inputs in the prompt.
The agent parses the text output, and produces either an AgentFinish or AgentAction. For instance, if the parsed text contains a prefix declared in the variable FINAL_ANSWER_ACTION (By default it’s Final Answer:) the agent returns an instance of AgentFinish. This enables the AgentExecutor to interpret the decision made by the language model and carry out actual executions of tools.
If the executor receives the AgentFinish object, the execution loop will be terminated, and the output will be returned using the getOutput function, which computes the final output based on the agent's finish step, intermediate steps, and additional data from the agent.
If the executor receives the AgentAction object, it will process the actions returned by the agent plan, calling the corresponding tools for each action and generating observations.
The action and observation in each step will be added to the execution steps array.
The loop will continue until the maximum number of iterations is reached or shouldContinue function determines to stop.
If the maximum number of iterations is reached or early stopping is triggered, the agent will return a stopped response, and the output will be returned using the getOutput function.

This looping process of planning, parsing, and executing tools, enables the agents to leverage on the decision making power of language models to build autonomous entities that can carry out more complicated tasks.

How I made this technical blog post using AI

Interestingly, this article was ~80% generated using a tool i built called Genie (with langchain). The goal for Genie is to help developers understand complex code implementations in minutes instead of hours. Do give it a go if you’re interested! www.birdlabs.ai (Check it out if you’re interested)

Here are some of the questions I asked in order to understand the mechanics behind autonomous agents.