Today, we're announcing the WunderGraph OpenAI integration/Agent SDK to simplify the creation of AI-enhanced APIs and AI Agents for Systems Integration on Autopilot. On a high level, this integration enables two things:
Build AI-enhanced APIs with OpenAI that return structured data (JSON) instead of plain text.
Build AI Agents that can perform complex tasks leveraging your existing REST, GraphQL, and SOAP APIs, as well as your databases and other systems.
Before we dive deep into the problem and technical details, let's have a look at two examples.
Here's a simple example that shows how we can use OpenAI to create an Agent that can call multiple APIs and return structured data (JSON) conforming to our defined API schema.
// .wundergraph/operations/openai/GetWeatherByCountry.ts
export default createOperation.query({
input: z.object({
country: z.string(),
}),
description: 'This operation returns the weather of the capital of the given country',
handler: async ({ input, openAI, log }) => {
// we cannot trust the user input, so we've got a helper function
// that parses the user input and validates it against a schema
const parsed = await openAI.parseUserInput({
userInput: input.country,
// we can use zod to define the schema
// if OpenAI cannot parse the user input,
// or zod validation fails, an error is thrown
schema: z.object({
country: z.string().nonempty(),
}),
});
// it's optional to use the parseUserInput helper function
// but it's recommended if you cannot trust the user input
// e.g. the user could have entered "Germany" or "DE",
// or just another prompt that is not a country at all and would confuse OpenAI
// next we create an agent to perform the actual task
const agent = openAI.createAgent({
// functions takes an array of functions that the agent can use
// these are our existing WunderGraph Operations that we've previously defined
// A WunderGraph Operation can interact with your APIs and databases
// You can use GraphQL and TypeScript to define Operations
// Typescript Operations (like this one right here) can host Agents
// So you can also call other Agents from within an Agent
functions: [{ name: 'CountryByCode' }, { name: 'weather/GetCityByName' }],
// We want to get structured data (JSON) back from the Agent
// so we define the output schema using zod again
structuredOutputSchema: z.object({
city: z.string(),
country: z.string(),
temperature: z.number(),
}),
});
// Finally, we execute the agent with a prompt
// The Agent will automatically fetch country data from the CountryByCode Operation
// and the weather data from the weather/GetCityByName Operation
// It will then generate a response using the schema we've defined
return agent.execWithPrompt({
prompt: `What's the weather like in the capital of ${parsed.country}?`,
});
},
});
How about extracting metadata from a website and exposing the functionality as a JSON API? Sounds simple enough, right?
// .wundergraph/operations/openai/GetWebsiteInfo.ts
export default createOperation.query({
input: z.object({
url: z.string(),
}),
description: 'This operation returns the title, description, h1 and a summary of the given website',
handler: async ({ input, openAI, log }) => {
const agent = openAI.createAgent({
model: 'gpt-3.5-turbo-16k-0613',
functions: [
{
name: 'web/load_url',
// we're using the web/load_url function to load the content (HTML) of a website
// our model is only capable of processing 16k tokens at once
// so we need to paginate the content and process it in chunks
// the Agent SDK will automatically split the content and merge the responses
pagination: {
// we set the page size to 15kb, you can play around with this value
pageSize: 1024 * 15,
// we also set a max page limit to prevent excessive usage
maxPages: 3,
},
},
{
// we can use nother Operation to summarize the content
// as the path suggests, it's using an Agent as well under the hood
// meaning that we're composing Agents here
name: 'openai/summarize_url_content',
},
],
// we define the output schema using zod again
// without this, our API would return plain text
// which would make it hard to consume for other systems
structuredOutputSchema: z.object({
title: z.string(),
description: z.string(),
h1: z.string(),
summary: z.string(),
}),
});
// we execute the agent with a prompt
return agent.execWithPrompt({
prompt: `Load the content of the URL: ${url}
You're a HTML parser. Your job is to extract the title, description and h1 from the HTML.
Do not include the HTML tags in the result.
Don't change the content, just extract the information.
Once this is done, add a summary of the website.
`,
});
},
});
The second example is a bit more complex, but it shows how you can describe more complex tasks with a prompt and have the AI Agent execute it for you.
Additionally, we're passing an Operation as a function to the Agent, which is another Agent under the hood, meaning that this API is actually composed of multiple Agents.
With these two examples, you should get a good idea of what's possible with the WunderGraph OpenAI integration.
Let's now rewind a bit and talk about the problems we're trying to solve here.
When trying to build AI-enhanced APIs and Agents, you'll quickly realize that there are a couple of challenges that you need to overcome. Let's quickly define what we mean by AI-enhanced APIs and Agents and then talk about the challenges.
An AI-enhanced API is an API that accepts an input in a predefined format and returns structured data (e.g., JSON), allowing it to be described using a schema (e.g., OpenAPI, GraphQL, etc.).
Tools like ChatGPT are fun to play with, but they're not very useful when you want to build APIs that can be consumed by other systems.
So, the bare minimum for an AI-enhanced API is that we can describe it using a schema; in our case, we're using JSON Schema which plays nicely with OpenAPI and OpenAI as you'll see later.
An AI Agent is a dialog between a large language model (e.g., GPT-3) and a computer program (e.g., a WunderGraph Operation) that is capable of performing a task.
The dialog is initiated by a prompt (e.g., a question or a task description).
We can provide additional functionality to the Agent by passing functions to it which we have to describe using a schema as well.
Once the dialog is initiated, the Agent can come back to us, asking to execute one of the functions we've provided.
It will provide the input to call the function, which will follow the schema we've defined.
We execute the function and add the result to the dialog, and the Agent will continue performing the task until it's done. Once the Agent is done, it will return the result to us; ideally in a format that we can describe using a schema.
If you've used ChatGPT before, you'll know that it's fun to play with if a powerful enough "Agent" sits in front of it, like a human (you).
But what if you want to build an API that can be consumed by other systems? How are services supposed to consume plain text without any structure?
When building an API, we usually have to deal with user input.
We can ask the user to provide a country name as the input to our API, but what if the user provides a prompt instead of a country name that is designed to trick the AI? This is called prompt injection, and it's a real problem when building AI-enhanced APIs.
LLMs are powerful, but they're not infinitely powerful.
They can only process a limited amount of tokens at once.
This means that we have to paginate the input, process it in chunks, and then merge the results back together, all in a structured way so that we can parse the result later.
You will usually start building lower-level Agents that perform a specific task, like loading the content of a website or summarizing the content of a website.
Once you have these Agents, you want to be able to compose them to build more powerful higher-level Agents.
How can we make it easy to compose AI Agents?
OpenAI allows you to describe functions that can be called by the Agent. The challenge is that you have to describe the functions using plain JSON Schema. This means that you cannot directly call REST, GraphQL or SOAP APIs, or even databases.
You have to describe the function using JSON Schema, and then implement a mechanism that calls APIs and databases on behalf of the Agent.
LLMs can generate GraphQL Operations or even SQL statements, but keep in mind that these need to be validated and sanitized before they can be executed.
In addition, requiring an LLM to manually generate GraphQL Operations, REST API calls or SQL statements comes with another problem:
You have to describe the GraphQL Schema, REST API, or the database schema, and all of this input will count towards the token limit of the LLM. This means that if you provide a GraphQL Schema with 16k tokens to a 16k-limited LLM, there's no space left for the actual prompt.
Wouldn't it be nice if we could describe just a few "Operations" that are useful to a specific Agent?
Yes, absolutely! But then there's another problem:
How can we describe Operations in a unified way that is compatible with OpenAI but works across different APIs like REST, SOAP, GraphQL, and databases?
Let's now talk about the solution to these problems using the WunderGraph OpenAI integration.
If you're not yet familiar with WunderGraph, it's an Open Source API Integration/BFF (Backend for Frontend) /Programmable API Gateway toolkit.
At the core of WunderGraph is the concept of "API Dependency Management/API Composition."
WunderGraph allows you to describe a set of heterogeneous APIs (REST, GraphQL, SOAP, Databases, etc.) using a single schema.
From this description, WunderGraph will generate a unified API that you can define "Operations" for.
Operations are the core building blocks of exposing functionality on top of your APIs. An Operation is essentially a function that can be called by a client. Both the input and the output of an Operation are described using JSON Schema.
All Operations exposed by a WunderGraph Application are described using an OpenAPI Specification (OAS) document or a Postman Collection, so it's easy to consume them from any programming language.
Having the "Operations" abstraction on top of your API Dependency Graph allowed us to keep the Agent as simple as it is.
All you need to do is add your API dependencies, define a couple of Operations that are useful to your Agent, and pass them along with a prompt to the Agent.
It doesn't matter if you're using REST, GraphQL, SOAP, a Database, or just another TypeScript function as an Operation, they all look the same to the Agent, and they all follow the same semantics.
Let's now talk about the challenges we mentioned earlier and how the WunderGraph OpenAI integration solves them.
By default, OpenAI will return plain text. So, when OpenAI is done processing our prompt, we'll get back a string of text. How can we turn this into structured data?
Let's recall the Agent definition from earlier:
const agent = openAI.createAgent({
functions: [{ name: 'CountryByCode' }, { name: 'weather/GetCityByName' }],
structuredOutputSchema: z.object({
city: z.string(),
country: z.string(),
temperature: z.number(),
}),
});
const out = await agent.execWithPrompt({
prompt: `What's the weather like in ${country}?`, // e.g. Germany
});
console.log(out.structuredOutput.city); // Berlin
We pass two functions to the Agent and define a schema that describes the output we expect from the Agent using the Zod library.
Internally, we will compile the schema to JSON Schema.
Once the Agent is done, we'll create a new "dialog" asking the Agent to call our "
To describe the input we're expecting to receive from the Agent, we'll use the generated JSON Schema.
This will prompt the Agent to call our "
We can then use the Zod library to parse the result and raise an error if the result doesn't match the schema we've defined.
As WunderGraph Operations are using TypeScript, we can infer the TypeScript types from the zod schema description, which means that the result of "out" will be typed automatically.
More importantly, we're also using the TypeScript compiler to infer the response type of Operations in general.
So if you're returning out.structuredOutput
from an Operation, another Operation can call our Operation in a type-safe way, or even use our Operation as a function for another Agent.
Let's recall another example from earlier:
export default createOperation.query({
input: z.object({
country: z.string(),
}),
description: 'This operation returns the weather of the capital of the given country',
handler: async ({ input, openAI, log }) => {
const parsed = await openAI.parseUserInput({
userInput: input.country,
schema: z.object({
country: z.string().nonempty(),
}),
});
// Agent code goes here
},
});
If we would pass the user input directly to our Agent, we would be vulnerable to prompt injection. This means that a malicious user could pass a prompt that would cause the Agent to execute arbitrary code.
To prevent this, we're first running the user input through the openAI.parseUserInput
function.
This function parses the input into our desired schema and validates it.
Furthermore, it will check for prompt injection attacks and throws an error if it detects one.
Let's say you'd like to summarize the content of a website.
Websites can be of arbitrary length, so we cannot just pass the content of the website to the Agent because LLMs like GTP have a token limit.
Instead, what we can do is split the content into pages, process each page individually, and then combine the results.
Here's an abbreviated example of how you can apply pagination to your Agent:
const agent = openAI.createAgent({
model: 'gpt-3.5-turbo-16k-0613',
functions: [
{
name: 'web/load_url',
// we're using the web/load_url function to load the content (HTML) of a website
// our model is only capable of processing 16k tokens at once
// so we need to paginate the content and process it in chunks
// the Agent SDK will automatically split the content and merge the responses
pagination: {
// we set the page size to 15kb, you can play around with this value
pageSize: 1024 * 15,
// we also set a max page limit to prevent excessive usage
maxPages: 3,
},
},
],
});
In this case, we're dividing the website content into 3 pages; each page is 15kb in size. The Agent will process each page individually and then combine the results.
If you recall the second example, we were passing a function named openai/summarize_url_content
to our Agent.
This Operation contains the logic to summarize the content of a website, using an Agent by itself.
In the prompt to our metadata extraction Agent, we ask it to summarize the content of the website, so our Agent will use the openai/summarize_url_content
function to do so.
As you can wrap Agents in an Operation, you can easily compose multiple Agents together.
The recommended way to do so is to start creating low-level Agents that are capable of doing a single thing.
You can then compose these low-level Agents into higher-level Agents that perform two or more tasks, and so on.
As explained earlier, WunderGraph Operations are an abstraction on top of your API Dependency Graph, allowing you to integrate any API into an AI Agent.
You can provide Operations in two ways to the Agent, either by using a GraphQL Operation against your API Graph or by creating a custom TypeScript Operation, which might contain custom business logic, call other APIs, or even other Agents.
Most importantly, we need a way to describe the input and functionality of an Operation to the LLM Agent.
All of this is abstracted away by the WunderGraph Agent SDK and works out of the box.
All you need to do is add a description to your Operation and the Agent SDK will take care of the rest.
Here's an example using a GraphQL Operation:
# .wundergraph/operations/CountryByCode.graphql
# Loads country information by code, the code needs to be in capital letters, e.g. DE for Germany
query ($code: ID!) {
countries_country(code: $code) {
code
name
currencies
capital
}
}
The Agent SDK will automatically parse the GraphQL Operation and generate a JSON Schema for the input including the description.
Here's an example using a custom TypeScript Operation:
// .wundergraph/operations/openai/summarize_url_content.ts
import { createOperation, z } from '../../generated/wundergraph.factory';
export default createOperation.query({
input: z.object({
url: z.string(),
}),
response: z.object({
summary: z.string(),
}),
description: 'Summarize the content of a URL',
handler: async ({ operations, input, log, openAI }) => {
// agent code goes here
},
});
Again, the Agent SDK will parse the TypeScript Operation as well and generate a JSON Schema from the zod schema, adding the description (Summarize the content of a URL
) so that the LLM Agent understands what the Operation is doing.
If you need more info on how to get started with WunderGraph and OpenAI, check out the OpenAI Integration Docs.
ps: make sure you're not leaking your API key in your GitHub repo!
In this article, we've learned how to use the WunderGraph Agent SDK to create AI Agents that can be used to integrate any API into your AI Agent.
We've tackled some of the most common problems when building AI Agents, like prompt injection, pagination, and Agent composition.
If you like the work we're doing and want to support us, give us a star on GitHub.
I'd love to hear your thoughts on this topic, so feel free to reach out to me on Twitter, or join our Discord server to chat about it.
Also published here