Calling multiple LLMs involves provider/model-specific configurations. Even if you unify the I/O, you still need a way to handle model/provider-specific edge-cases.
We faced this last week when Anthropic told us we’d violated their content policy. Since, we provide our community access to LLMs like Claude-2 etc. through our open-source proxy server.
Checking queries through the OpenAI moderations endpoint would slow down queries, so we only wanted to run this on Anthropic’s models.
if model in ["claude-instant-1", "claude-2"]:
# run moderations check
return litellm.completion(model, messages)
But conditional logic like this leads to bugs. We’d faced this exact issue before, and had built LiteLLM to solve this problem for us (abstraction library that simplified LLM API calls).
tldr;
Our solution was to have LiteLLM handle this for us, and control it’s logic through a config file. This removed conditional logic from our server code, and still allowed us to control provider/model specific details.
This also enabled us to handle other scenarios like context window errors, max tokens, etc.
Here’s our complete code:
import litellm
import os
config = {
"default_fallback_models": ["gpt-3.5-turbo", "claude-instant-1", "j2-ultra"],
"model": {
"claude-instant-1": {
"needs_moderation": True
},
"gpt-3.5-turbo": {
"error_handling": {
"ContextWindowExceededError": {"fallback_model": "gpt-3.5-turbo-16k"}
}
}
}
}
# set env var
os.environ["OPENAI_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your openai key
os.environ["ANTHROPIC_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your anthropic key
sample_text = "how does a court case get to the Supreme Court?" * 1000
messages = [{"content": sample_text, "role": "user"}]
response = completion_with_config(model="gpt-3.5-turbo", messages=messages, config=config)
print(response) # should be gpt-3.5-turbo-16k
Config files currently manage:
Over time, this will handle other model-specific parameters like setting max tokens, prompt formatting, etc. Ideas/suggestions are welcome!
LiteLLM already simplified calling LLM providers, with a drop-in replacement for the OpenAI ChatCompletion’s endpoint.
With config files, it can now let you add new models in production, without changing any server-side code.
Overall, LiteLLM is an excellent choice for anyone looking to add non-OpenAI models in production quickly and easily.
We are actively trying to grow this project so no matter your skill level we welcome contributions! Open up an issue if you find missing features/bugs or contribute to existing issues. Star us on GitHub if you want to follow our progress as new updates come.