Skip to content

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

License

Notifications You must be signed in to change notification settings

BerriAI/litellm

Folders and files

NameName
Last commit message
Last commit date
Feb 6, 2024
Feb 5, 2024
Feb 5, 2024
Feb 8, 2024
Jan 25, 2024
Jan 12, 2024
Jan 6, 2024
Feb 8, 2024
Feb 9, 2024
Feb 7, 2024
Feb 9, 2024
Aug 28, 2023
Dec 23, 2023
Aug 31, 2023
Feb 5, 2024
Feb 5, 2024
Feb 7, 2024
Jan 10, 2024
Feb 7, 2024
Jul 27, 2023
Feb 8, 2024
Jan 26, 2024
Jan 10, 2024
Feb 8, 2024
Dec 27, 2023
Jan 26, 2024
Feb 4, 2024
Feb 8, 2024
Feb 6, 2024
Jan 6, 2024
Feb 9, 2024
Nov 23, 2023

Repository files navigation

🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, etc.]

LiteLLM manages:

  • Translate inputs to provider's completion, embedding, and image_generation endpoints
  • Consistent output, text responses will always be available at ['choices'][0]['message']['content']
  • Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router

Jump to OpenAI Proxy Docs
Jump to Supported LLM Providers

Usage (Docs)

Important

LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here

Open In Colab
pip install litellm
from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 
os.environ["COHERE_API_KEY"] = "your-cohere-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)

Async (Docs)

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="gpt-3.5-turbo", messages=messages)
    return response

response = asyncio.run(test_get_response())
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

# claude 2
response = completion('claude-2', messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

Logging Observability (Docs)

LiteLLM exposes pre defined callbacks to send data to Langfuse, DynamoDB, s3 Buckets, LLMonitor, Helicone, Promptlayer, Traceloop, Slack

from litellm import completion

## set env variables for logging tools
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"

os.environ["OPENAI_API_KEY"]

# set callbacks
litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase

#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

OpenAI Proxy - (Docs)

Track spend across multiple projects/people

The proxy provides:

  1. Hooks for auth
  2. Hooks for logging
  3. Cost tracking
  4. Rate Limiting

📖 Proxy Endpoints - Swagger Docs

Quick Start Proxy - CLI

pip install 'litellm[proxy]'

Step 1: Start litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:8000

Step 2: Make ChatCompletions Request to Proxy

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

Proxy Key Management (Docs)

Track Spend, Set budgets and create virtual keys for the proxy POST /key/generate

Request

curl 'http://0.0.0.0:8000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "ishaan@berri.ai", "team": "core-infra"}}'

Expected Response

{
    "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
    "expires": "2023-11-19T01:38:25.838000+00:00" # datetime object
}

[Beta] Proxy UI (Docs)

A UI to create keys, track spend per key

Code: https://github.com/BerriAI/litellm/tree/main/ui
ui_3

Supported Providers (Docs)

Provider Completion Streaming Async Completion Async Streaming Async Embedding Async Image Generation
openai
azure
aws - sagemaker
aws - bedrock
google - vertex_ai [Gemini]
google - palm
google AI Studio - gemini
mistral ai api
cloudflare AI Workers
cohere
anthropic
huggingface
replicate
together_ai
openrouter
ai21
baseten
vllm
nlp_cloud
aleph alpha
petals
ollama
deepinfra
perplexity-ai
anyscale
voyage ai
xinference [Xorbits Inference]

Read the Docs

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Step 1: Clone the repo

git clone https://github.com/BerriAI/litellm.git

Step 2: Navigate into the project, and install dependencies:

cd litellm
poetry install

Step 3: Test your change:

cd litellm/tests # pwd: Documents/litellm/litellm/tests
poetry run flake8
poetry run pytest .

Step 4: Submit a PR with your changes! 🚀

  • push your fork to your GitHub repo
  • submit a PR from there

Support / talk with founders

Why did we build this

  • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

Contributors