The landscape of chatbot technology has seen significant evolution.

Chat-bots 5-10 years ago were simple, they used State Machines.

Today, Chat-bots are Large Language Models which use AI Agents instead of traditional States.

I want to propose the concept of AI routing, which promises to enhance the efficiency and effectiveness of multiple LLM agents.

From Deterministic to Probabilistic Automation

Pre-GPT Chatbots: State Machines

Originally, chatbots were Finite Determined Automation, behaving predictably with each user interaction. These chatbots functioned as "State Machines," moving from one state to another based on predefined rules. If you ever interacted with pre-GPT chatbots, most of them were straightforward. They ask for step 1, step 2, step 3. Every state was predictable.

However, this is not how we dialogue, because any dialogue quickly goes out of the scope of any state.

Current Stage: Transition to Undetermined Automation

New generation chat-bots are Undetermined: Each state has a probability to move to the next state. This probability is determined by the random seed of the LLM model which is used. That is why this model is a Finite Undetermined Automation. Or Even Infinite Undetermined Automation

The Current Landscape of AI Interaction

Deterministic Supervision and Chain-of-Interaction

Chain of thought (CoT) prompting 1 is a recent advancement in prompting methods that encourage Large Language Models (LLMs) to explain their reasoning. It leverages multiple LLM calls within an expert framework to better predict the course of the dialogue, as demonstrated in research on psychiatric behavior understanding.

Speaking easy, it gives more logic to the input, sharing more examples with the LLM. LLMs like examples; the same as humans.

Without a doubt, it boosts the quality of the outcome. How efficient is that? Well, for some models it may triple the quality:

CoT is widely used in research for Math Word Problems and other topics where logical reasoning is important.

Chain-of-Interaction: LLMs for Psychiatric Behavior Understanding

But CoT is not used only in Math problems. In the article, Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts, researchers have put Deterministic supervisory: based on user interaction predict the next state.

Example: Psychiatric LLM realizes a person doesn`t want to talk about drinking, the model must recognize a new state “Change talk” and push a new talk to the dialogue.

This approach was proposed in the article “Chain-of-Interactionprompting” where the model predicts a chain:

Interaction Definition
Involvement Assessment
Valence Analysis

Long story short, instead of predicting the next state of an LLM-therapist, they do multiple calls of LLM, and they use an expert framework. This expert framework and a sequence of LLM calls give better predictions of the next state of interaction.

What if LLMs Start Debating?

The third example of the Chain of Thought approach is ChainLM. If there is a complex question, then you can run the game like in a GAN - one agrees, the other disagrees, and they make iterations of the reasoning. Repeat until you get a response.

ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting

This thing is being further trained using common sense data sets. It highlights the importance of real-world observations. Because if you do the same but for regular LLMs, it may fail in reasoning, and debates will end with a meaningless point.

Optimizing With AI Routing

Routing is an important step to assist existing apps in creating the optimal selection of LLMs. GPT-4 can give accurate results in various topics, but is expensive, slow, and not as efficient overall, due to high price. While some other models may be cheaper, faster, and better for specific tasks.

Studies such as Routerbench have highlighted the potential for cost optimization through intelligent model selection, while Tryage introduces a real-time, intelligent routing system inspired by the human brain's architecture.

ROUTERBENCH: A Benchmark for Multi-LLM Routing System

Advanced Routing and Coordination Mechanisms

Choosing the right model for a task presents a unique challenge since different models perform differently on various datasets.

Caltech researchers have proposed the Tyage architecture, inspired by the human brain's structure. In the brain, the thalamus directs sensory inputs to specific parts of the cortex. Similarly, the Tyage system is designed to direct user prompts to the most suitable Large Language Models.

This approach allows for efficient processing of information, akin to how the brain handles tasks like identifying objects, detecting motion, localizing objects, and recognizing faces by routing these tasks to specialized areas for processing.

Further Ideas

If you manage the coordination of AI Agents, you want to get more value for the company.

You don’t want to use unsafe agents in medical care startups, and you don't want to use clickbait advertisement agents in the sales scripts because they will start promoting NSFW content.

Managing AI Agents is a trade-off between short-term value and long-term value.

Short-term values can be money, click-through-rate, session length
Long-term values are company reputation, safety

Throughout the lifetime, each chat with AI Assistant within various contexts is labeled as Context_j.

In mathematical terms, the goal is to evaluate and rank each Large Language Model (LLM_i) based on a value metric within every given context (Context_j).

Therefore, for each specific context (Context_j), the appropriate LLM_i must be chosen to ensure that the value in the subsequent context (Context_(j+1)) is maximized, as well as maximizing the value in the ultimate context (Context_inf).

Let`s imagine your AI is a family coach. You had an argument, and you asked, “AI what should I do?”

And it must give you a piece of advice.

Visiting a strip club requires money and provides immediate stress relief, but may lead to questions from your spouse shortly afterward.

Meditation costs nothing, offers immediate stress relief, and does not result in questions from your spouse.

Talking with your spouse may not provide immediate stress relief, but it offers benefits far beyond those of a strip club.

Returning to work generates extra income, but does not directly address the issue at hand.

Even in this example, our Value is a combination of personal benefits and group benefits. And this example is radical. In real life, we have to work with epsilon function and continuous A/B tests of AI agents.

These challenges are typically addressed using a multi-armed bandit approach that strikes a balance between the exploration of new strategies and the exploitation of known effective ones. Following this, existing LLMs (LLM_i) can be enhanced through a Chain of Thoughts and recurrent LLM debates.

If you found this discussion intriguing and are curious about exploring the possibilities of AI Router further, I'd be thrilled to dive deeper into this topic with you. The engineering challenges we face in advancing AI technologies are complex, but it's precisely these challenges that fuel our progress and innovation. If you're interested in a pilot project or simply wish to exchange ideas, please don't hesitate to get in touch.

Together, we have the potential to shape a brighter, more innovative future

From Chatbots to AI Routing: An Essay