The Problem Calling multiple LLM providers involves messy code - each provider has it’s own package and different input/output. Langchain is too bloated and doesn’t provide consistent I/O across all LLM APIs. I remember distinctly, when we added support for Azure and Cohere on our ‘chat-with-your-data’ application. APIs can fail (e.g. Azure readtimeout errors), so we wrote a fallback strategy to iterate through a list of models in case one failed (e.g. if Azure fails, try Cohere first, OpenAI second etc.). Provider-specific implementations meant our for-loops became increasingly large (think: ), and since made LLM API calls in multiple places in our code, our . Because now we had multiple for-loop chunks across our codebase. multiple ~100 line if/else statements debugging problems exploded The Solution: simplified LLM API calls Abstraction. That’s when we decided to abstract our api calls behind a single class. We needed I/O that just worked, so we could spend time improving other parts of our system (error-handling/model-fallback logic, etc.). This class needed to do 3 things really well: Remove the need for multiple if/else statements, I can call all models the same way, and expect responses in the same format for each (including consistent exception types if it fails). Consistent I/O: : the class shouldn’t be the reason I drop requests in prod. Be reliable : No obscure errors. If a request did fail - what happened? and why?. Be observable That’s when we LiteLLM - a simple package to call Azure, Anthropic, OpenAI, Cohere and Replicate. built from litellm import completion ## set ENV variables os.environ["OPENAI_API_KEY"] = "openai key" os.environ["COHERE_API_KEY"] = "cohere key" messages = [{ "content": "Hello, how are you?","role": "user"}] # openai call response = completion(model="gpt-3.5-turbo", messages=messages) # cohere call response = completion("command-nightly", messages) already live in production for us (and 500+ others) and has It’s handled 50k+ queries. LiteLLM : manages Calling using the - all LLM APIs OpenAI format completion(model, messages) (incl. token usage) for all LLM APIs, text responses will always be available at Consistent output ['choices'][0]['message']['content'] for all LLM APIs, we map RateLimit, Context Window, and Authentication Error exceptions across all providers to their OpenAI equivalents. Consistent Exceptions see Code In case of error, LiteLLM also provides: - see exactly what the raw model request/response is by plugging in your own function and/or print statements from the package Logging completion(.., logger_fn=your_logging_fn) litellm.set_verbose=True - automatically send your data to Sentry, Posthog, Slack, Supabase, Helicone, etc. - , Callbacks litellm.success_callbacks litellm.failure_callbacks see Callbacks Conclusion LiteLLM simplifies calling LLM providers, with a drop-in replacement for the OpenAI ChatCompletion’s endpoint. Making it easy for you to add new models to your system in minutes (using the same exception-handling, token logic, etc. you already wrote for OpenAI).