Calling multiple LLM providers involves messy code - each provider has it’s own package and different input/output. Langchain is too bloated and doesn’t provide consistent I/O across all LLM APIs.
I remember distinctly, when we added support for Azure and Cohere on our ‘chat-with-your-data’ application. APIs can fail (e.g. Azure readtimeout errors), so we wrote a fallback strategy to iterate through a list of models in case one failed (e.g. if Azure fails, try Cohere first, OpenAI second etc.).
Provider-specific implementations meant our for-loops became increasingly large (think: multiple ~100 line if/else statements), and since made LLM API calls in multiple places in our code, our debugging problems exploded. Because now we had multiple for-loop chunks across our codebase.
Abstraction. That’s when we decided to abstract our api calls behind a single class. We needed I/O that just worked, so we could spend time improving other parts of our system (error-handling/model-fallback logic, etc.).
This class needed to do 3 things really well:
That’s when we built LiteLLM - a simple package to call Azure, Anthropic, OpenAI, Cohere and Replicate.
from litellm import completion
## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion("command-nightly", messages)
It’s already live in production for us (and 500+ others) and has handled 50k+ queries.
LiteLLM manages:
completion(model, messages)
['choices'][0]['message']['content']
In case of error, LiteLLM also provides:
completion(.., logger_fn=your_logging_fn)
and/or print statements from the package litellm.set_verbose=True
litellm.success_callbacks
, litellm.failure_callbacks
see CallbacksLiteLLM simplifies calling LLM providers, with a drop-in replacement for the OpenAI ChatCompletion’s endpoint. Making it easy for you to add new models to your system in minutes (using the same exception-handling, token logic, etc. you already wrote for OpenAI).