2,959 reads

LiteLLM Configs: Reliably Call 100+ LLMs

by Krrish DholakiaSeptember 22nd, 2023

Too Long; Didn't Read

LiteLLM is a drop-in replacement for the openai-python sdk. Letting you call 100+ LLMs. With config files, it can now let you add new models in production, without changing any server-side code.

featured image - LiteLLM Configs: Reliably Call 100+ LLMs

The Problem

Calling multiple LLMs involves provider/model-specific configurations. Even if you unify the I/O, you still need a way to handle model/provider-specific edge-cases.

We faced this last week when Anthropic told us we’d violated their content policy. Since, we provide our community access to LLMs like Claude-2 etc. through our open-source proxy server.

Checking queries through the OpenAI moderations endpoint would slow down queries, so we only wanted to run this on Anthropic’s models.

if model in ["claude-instant-1", "claude-2"]: 
  # run moderations check

return litellm.completion(model, messages)

But conditional logic like this leads to bugs. We’d faced this exact issue before, and had built LiteLLM to solve this problem for us (abstraction library that simplified LLM API calls).

tldr;

We didn’t want conditional logic on our server.
We needed a way to control which models/providers it ran this check for.

The Solution: Config Files

Our solution was to have LiteLLM handle this for us, and control it’s logic through a config file. This removed conditional logic from our server code, and still allowed us to control provider/model specific details.

This also enabled us to handle other scenarios like context window errors, max tokens, etc.

Here’s our complete code:

import litellm 
import os 

config = {
    "default_fallback_models": ["gpt-3.5-turbo", "claude-instant-1", "j2-ultra"],
    "model": {
        "claude-instant-1": {
            "needs_moderation": True
        },
        "gpt-3.5-turbo": {
            "error_handling": {
                "ContextWindowExceededError": {"fallback_model": "gpt-3.5-turbo-16k"} 
            }
        }
    }
}

# set env var 
os.environ["OPENAI_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your openai key
os.environ["ANTHROPIC_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your anthropic key

sample_text = "how does a court case get to the Supreme Court?" * 1000
messages = [{"content": sample_text, "role": "user"}]
response = completion_with_config(model="gpt-3.5-turbo", messages=messages, config=config)

print(response) # should be gpt-3.5-turbo-16k

Config files currently manage:

Prompt logic - picking the right model for a given prompt, as well as trimming a prompt if it’s larger than any available model
Fallback logic - letting you set default fallbacks as well as model-specific ones (e.g. the context window error above).
Moderations - if a provider (E.g. Anthropic) requires you to moderate your requests

Over time, this will handle other model-specific parameters like setting max tokens, prompt formatting, etc. Ideas/suggestions are welcome!

Conclusion

LiteLLM already simplified calling LLM providers, with a drop-in replacement for the OpenAI ChatCompletion’s endpoint.

With config files, it can now let you add new models in production, without changing any server-side code.

Overall, LiteLLM is an excellent choice for anyone looking to add non-OpenAI models in production quickly and easily.

We are actively trying to grow this project so no matter your skill level we welcome contributions! Open up an issue if you find missing features/bugs or contribute to existing issues. Star us on GitHub if you want to follow our progress as new updates come.