The landscape of chatbot technology has seen significant evolution.
I want to propose the concept of AI routing, which promises to enhance the efficiency and effectiveness of multiple LLM agents.
Originally, chatbots were Finite Determined Automation, behaving predictably with each user interaction. These chatbots functioned as "State Machines," moving from one state to another based on predefined rules. If you ever interacted with pre-GPT chatbots, most of them were straightforward. They ask for step 1, step 2, step 3. Every state was predictable.
However, this is not how we dialogue, because any dialogue quickly goes out of the scope of any state.
New generation chat-bots are Undetermined: Each state has a probability to move to the next state. This probability is determined by the random seed of the LLM model which is used. That is why this model is a Finite Undetermined Automation. Or Even Infinite Undetermined Automation
Chain of thought (CoT) prompting
Speaking easy, it gives more logic to the input, sharing more examples with the LLM. LLMs like examples; the same as humans.
Without a doubt, it boosts the quality of the outcome. How efficient is that? Well, for some models it may triple the quality:
CoT is widely used in research for Math Word Problems and other topics where logical reasoning is important.
But CoT is not used only in Math problems. In the article,
Example: Psychiatric LLM realizes a person doesn`t want to talk about drinking, the model must recognize a new state “Change talk” and push a new talk to the dialogue.
This approach was proposed in the article “Chain-of-Interactionprompting” where the model predicts a chain:
Long story short, instead of predicting the next state of an LLM-therapist, they do multiple calls of LLM, and they use an expert framework. This expert framework and a sequence of LLM calls give better predictions of the next state of interaction.
The third example of the Chain of Thought approach is ChainLM. If there is a complex question, then you can run the game like in a GAN - one agrees, the other disagrees, and they make iterations of the reasoning. Repeat until you get a response.
This thing is being further trained using common sense data sets. It highlights the importance of real-world observations. Because if you do the same but for regular LLMs, it may fail in reasoning, and debates will end with a meaningless point.
Routing is an important step to assist existing apps in creating the optimal selection of LLMs. GPT-4 can give accurate results in various topics, but is expensive, slow, and not as efficient overall, due to high price. While some other models may be cheaper, faster, and better for specific tasks.
Studies such as Routerbench have highlighted the potential for cost optimization through intelligent model selection, while Tryage introduces a real-time, intelligent routing system inspired by the human brain's architecture.
Choosing the right model for a task presents a unique challenge since different models perform differently on various datasets.
Caltech researchers have proposed the Tyage architecture, inspired by the human brain's structure. In the brain, the thalamus directs sensory inputs to specific parts of the cortex. Similarly, the Tyage system is designed to direct user prompts to the most suitable Large Language Models.
This approach allows for efficient processing of information, akin to how the brain handles tasks like identifying objects, detecting motion, localizing objects, and recognizing faces by routing these tasks to specialized areas for processing.
If you manage the coordination of AI Agents, you want to get more value for the company.
You don’t want to use unsafe agents in medical care startups, and you don't want to use clickbait advertisement agents in the sales scripts because they will start promoting NSFW content.
Managing AI Agents is a trade-off between short-term value and long-term value.
Throughout the lifetime, each chat with AI Assistant within various contexts is labeled as Context_j.
In mathematical terms, the goal is to evaluate and rank each Large Language Model (LLM_i) based on a value metric within every given context (Context_j).
Therefore, for each specific context (Context_j), the appropriate LLM_i must be chosen to ensure that the value in the subsequent context (Context_(j+1)) is maximized, as well as maximizing the value in the ultimate context (Context_inf).
Let`s imagine your AI is a family coach. You had an argument, and you asked, “AI what should I do?”
And it must give you a piece of advice.
Even in this example, our Value is a combination of personal benefits and group benefits. And this example is radical. In real life, we have to work with epsilon function and continuous A/B tests of AI agents.
These challenges are typically addressed using a multi-armed bandit approach that strikes a balance between the exploration of new strategies and the exploitation of known effective ones. Following this, existing LLMs (LLM_i) can be enhanced through a Chain of Thoughts and recurrent LLM debates.
If you found this discussion intriguing and are curious about exploring the possibilities of AI Router further, I'd be thrilled to dive deeper into this topic with you. The engineering challenges we face in advancing AI technologies are complex, but it's precisely these challenges that fuel our progress and innovation. If you're interested in a pilot project or simply wish to exchange ideas, please don't hesitate to get in touch.
Together, we have the potential to shape a brighter, more innovative future