The Problem Calling multiple LLMs involves provider/model-specific configurations. Even if you unify the I/O, you still need a way to handle model/provider-specific edge-cases. We faced this last week when Anthropic told us we’d violated their content policy. Since, we provide our community access to LLMs like Claude-2 etc. through our open-source . proxy server Checking queries through the OpenAI moderations endpoint would slow down queries, so we only wanted to run this on Anthropic’s models. if model in ["claude-instant-1", "claude-2"]: # run moderations check return litellm.completion(model, messages) But conditional logic like this leads to bugs. We’d faced this exact issue before, and had (abstraction library that simplified LLM API calls). built LiteLLM to solve this problem for us tldr; We want conditional logic on our server. didn’t We a way to control which models/providers it ran this check for. needed The Solution: Config Files Our solution was to have LiteLLM this for us, and control it’s logic through a config file. This , and still allowed us to . handle removed conditional logic from our server code control provider/model specific details This also enabled us to handle other scenarios like context window errors, max tokens, etc. Here’s our complete code: import litellm import os config = { "default_fallback_models": ["gpt-3.5-turbo", "claude-instant-1", "j2-ultra"], "model": { "claude-instant-1": { "needs_moderation": True }, "gpt-3.5-turbo": { "error_handling": { "ContextWindowExceededError": {"fallback_model": "gpt-3.5-turbo-16k"} } } } } # set env var os.environ["OPENAI_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your openai key os.environ["ANTHROPIC_API_KEY"] = "sk-litellm-5b46387675a944d2" # [OPTIONAL] replace with your anthropic key sample_text = "how does a court case get to the Supreme Court?" * 1000 messages = [{"content": sample_text, "role": "user"}] response = completion_with_config(model="gpt-3.5-turbo", messages=messages, config=config) print(response) # should be gpt-3.5-turbo-16k Config files currently manage: Prompt logic - picking the right model for a given prompt, as well as trimming a prompt if it’s larger than any available model Fallback logic - letting you set default fallbacks as well as model-specific ones (e.g. the context window error above). Moderations - if a provider (E.g. Anthropic) requires you to moderate your requests Over time, this will handle other model-specific parameters like setting max tokens, prompt formatting, etc. Ideas/suggestions are welcome! Conclusion LiteLLM already simplified calling LLM providers, with a drop-in replacement for the OpenAI ChatCompletion’s endpoint. With config files, it can now let you add new models in production, without changing any server-side code. Overall, LiteLLM is an excellent choice for anyone looking to add non-OpenAI models in production quickly and easily. We are actively trying to grow this project so no matter your skill level we welcome contributions! Open up an issue if you find missing features/bugs or contribute to existing issues. on GitHub if you want to follow our progress as new updates come. Star us