Releases · BerriAI/litellm

10 Feb 17:14

github-actions

v1.23.8

525b0dc

v1.23.8

Full Changelog: v1.23.7...v1.23.8

Assets 2

10 Feb 04:59

github-actions

v1.23.7

e977685

v1.23.7

[FEAT] ui - view total proxy spend / budget by @ishaan-jaff in #1915
[FEAT] Bedrock set timeouts on litellm.completion by @ishaan-jaff in #1919
[FEAT] Use LlamaIndex with Proxy - Support azure deployments for /embeddings - by @ishaan-jaff in #1921
[FIX] Verbose Logger - don't double print CURL command by @ishaan-jaff in #1924
[FEAT] Set timeout for bedrock on proxy by @ishaan-jaff in #1922
feat(proxy_server.py): show admin global spend as time series data by @krrishdholakia in #1920

1. Bedrock Set Timeouts

Usage - litellm.completion

response = litellm.completion(
    model="bedrock/anthropic.claude-instant-v1",
    timeout=0.01,
    messages=[{"role": "user", "content": "hello, write a 20 pg essay"}],
)

Usage on Proxy config.yaml

model_list:
  - model_name: BEDROCK_GROUP
    litellm_params:
      model: bedrock/cohere.command-text-v14
      timeout: 0.0001

2 View total proxy spend / budget

3. Use LlamaIndex with Proxy - Support azure deployments for /embeddings

Send Embedding requests like this

http://0.0.0.0:4000/openai/deployments/azure-embedding-model/embeddings?api-version=2023-07-01-preview

This allow users to use llama index AzureOpenAI with LiteLLM

Use LlamaIndex with LiteLLM Proxy

import os, dotenv

from dotenv import load_dotenv

load_dotenv()

from llama_index.llms import AzureOpenAI
from llama_index.embeddings import AzureOpenAIEmbedding
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext

llm = AzureOpenAI(
    engine="azure-gpt-3.5",
    temperature=0.0,
    azure_endpoint="http://0.0.0.0:4000",
    api_key="sk-1234",
    api_version="2023-07-01-preview",
)

embed_model = AzureOpenAIEmbedding(
    deployment_name="azure-embedding-model",
    azure_endpoint="http://0.0.0.0:4000",
    api_key="sk-1234",
    api_version="2023-07-01-preview",
)


# response = llm.complete("The sky is a beautiful blue and")
# print(response)

documents = SimpleDirectoryReader("llama_index_data").load_data()
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

Full Changelog: v1.23.5...v1.23.7

Contributors

krrishdholakia and ishaan-jaff

Assets 2

09 Feb 07:30

github-actions

v1.23.5

e39ce9b

v1.23.5

What's Changed

fix(proxy_server.py): enable aggregate queries via /spend/keys by @krrishdholakia in #1901
fix(factory.py): mistral message input fix by @krrishdholakia in #1902

Full Changelog: v1.23.4...v1.23.5

Contributors

krrishdholakia

Assets 2

09 Feb 06:08

github-actions

v1.23.4

b9393fb

v1.23.4

What's Changed

[FEAT] 76 % Faster s3 logging Proxy / litellm.acompletion / router.acompletion 🚀 by @ishaan-jaff in #1892
(feat) Add support for AWS credentials from profile file by @dleen in #1895
Litellm langfuse error logging - log input by @krrishdholakia in #1898
Admin UI - View Models, TPM, RPM Limit of a Key by @ishaan-jaff in #1903
Admin UI - show delete confirmation when deleting keys by @ishaan-jaff in #1904

Full Changelog: v1.23.3...v1.23.4

Contributors

dleen, krrishdholakia, and ishaan-jaff

Assets 2

08 Feb 19:45

github-actions

v1.23.3

922343f

v1.23.3

What's Changed

[FEAT] 78% Faster s3 Cache⚡️- Proxy/ litellm.acompletion/ litellm.Router.acompletion by @ishaan-jaff in #1891

Full Changelog: v1.23.2...v1.23.3

Contributors

ishaan-jaff

Assets 2

08 Feb 04:41

github-actions

v1.23.2

3c54d8d

v1.23.2

What's Changed 🐬

[FEAT] Azure Pricing - based on base_model in model_info
[Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID
[Feat] Slack Alert when budget tracking fails

1. [FEAT] Azure Pricing - based on base_model in model_info by @ishaan-jaff in #1874

Azure Pricing - Use Base model for cost calculation

Why ?

Azure returns gpt-4 in the response when azure/gpt-4-1106-preview is used, We were using gpt-4 when calculating response_cost

How to use - set `base_model` on config.yaml

model_list:
  - model_name: azure-gpt-3.5
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"
    model_info:
      base_model: azure/gpt-4-1106-preview

View Cost calculated on Langfuse

This used the correct pricing for azure/gpt-4-1106-preview = (9*0.00001) + (28*0.00003)

2. [Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID by @ishaan-jaff in #1878

If a trace_id is passed we'll place the semantic cache embedding call in the same trace
We now track cost for the API key that will make the embedding call for semantic caching

3. [Feat] Slack Alert when budget tracking fails by @ishaan-jaff in #1877

Full Changelog: v1.23.1...v1.23.2

Contributors

ishaan-jaff

Assets 2

08 Feb 02:37

github-actions

v1.23.1

e17e783

v1.23.1

What's Changed

[Feat] add azure/gpt-4-0125-preview by @ishaan-jaff in #1876

Full Changelog: v1.23.0...v1.23.1

Contributors

ishaan-jaff

Assets 2

07 Feb 09:13

github-actions

v1.23.0

8939593

v1.23.0

What's Changed

feat(ui): enable admin to view all valid keys created on the proxy by @krrishdholakia in #1843
fix(proxy_server.py): prisma client fixes for high traffic by @krrishdholakia in #1860

Full Changelog: v1.22.11...v1.23.0

Contributors

krrishdholakia

Assets 2

07 Feb 04:09

github-actions

v1.22.11

5f4b06f

v1.22.11

Full Changelog: v1.22.10...v1.22.11

Assets 2

07 Feb 02:54

github-actions

v1.22.10

7b26b3b

v1.22.10

What's Changed

fix(proxy_server.py): do a health check on db before returning if proxy ready (if db connected) by @krrishdholakia in #1856
fix(utils.py): return finish reason for last vertex ai chunk by @krrishdholakia in #1847
fix(proxy/utils.py): if langfuse trace id passed in, include in slack alert by @krrishdholakia in #1839
[Feat] Budgets for 'user' param passed to /chat/completions, /embeddings etc by @ishaan-jaff in #1859

Semantic Caching Support - Add Semantic Caching to litellm💰 by @ishaan-jaff in #1829

Use with LiteLLM Proxy https://docs.litellm.ai/docs/proxy/caching
Use with litellm.completion https://docs.litellm.ai/docs/caching/redis_cache

Usage with Proxy

Step 1: Add `cache` to the config.yaml

model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
  - model_name: azure-embedding-model
    litellm_params:
      model: azure/azure-embedding-model
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2023-07-01-preview"

litellm_settings:
  set_verbose: True
  cache: True          # set cache responses to True, litellm defaults to using a redis cache
  cache_params:
    type: "redis-semantic"  
    similarity_threshold: 0.8   # similarity threshold for semantic cache
    redis_semantic_cache_embedding_model: azure-embedding-model # set this to a model_name set in model_list

Step 2: Add Redis Credentials to .env

Set either REDIS_URL or the REDIS_HOST in your os environment, to enable caching.

REDIS_URL = ""        # REDIS_URL='redis://username:password@hostname:port/database'
## OR ## 
REDIS_HOST = ""       # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
REDIS_PORT = ""       # REDIS_PORT='18841'
REDIS_PASSWORD = ""   # REDIS_PASSWORD='liteLlmIsAmazing'

Additional kwargs
You can pass in any additional redis.Redis arg, by storing the variable + value in your os environment, like this:

REDIS_<redis-kwarg-name> = ""

Step 3: Run proxy with config

$ litellm --config /path/to/config.yaml

That's IT !

(You'll see semantic-similarity on langfuse if you set langfuse as a success_callback)
(FYI the api key here is deleted 🔑)

Usage with `litellm.completion`

litellm.cache = Cache(
        type="redis-semantic",
        host=os.environ["REDIS_HOST"],
        port=os.environ["REDIS_PORT"],
        password=os.environ["REDIS_PASSWORD"],
        similarity_threshold=0.8,
        redis_semantic_cache_embedding_model="text-embedding-ada-002",
  )
  response1 = completion(
      model="gpt-3.5-turbo",
      messages=[
          {
              "role": "user",
              "content": f"write a one sentence poem about: {random_number}",
          }
      ],
      max_tokens=20,
  )
  print(f"response1: {response1}")

  random_number = random.randint(1, 100000)

  response2 = completion(
      model="gpt-3.5-turbo",
      messages=[
          {
              "role": "user",
              "content": f"write a one sentence poem about: {random_number}",
          }
      ],
      max_tokens=20,
  )
  print(f"response2: {response1}")
  assert response1.id == response2.id

Budgets for 'user' param passed to /chat/completions, /embeddings etc

budget user passed to /chat/completions, without needing to create a key for every user passed
docs: https://docs.litellm.ai/docs/proxy/users

How to Use

Define a litellm.max_user_budget on your confg

litellm_settings:
  max_budget: 10      # global budget for proxy 
  max_user_budget: 0.0001 # budget for 'user' passed to /chat/completions

Make a /chat/completions call, pass 'user' - First call Works

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

Make a /chat/completions call, pass 'user' - Call Fails, since 'ishaan3' over budget

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

Error

{"error":{"message":"Authentication Error, ExceededBudget: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%

Full Changelog: v1.22.9...v1.22.10

Contributors

krrishdholakia and ishaan-jaff

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Bedrock Set Timeouts

Usage - litellm.completion

Usage on Proxy config.yaml

2 View total proxy spend / budget

3. Use LlamaIndex with Proxy - Support azure deployments for /embeddings

Use LlamaIndex with LiteLLM Proxy

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed 🐬

1. [FEAT] Azure Pricing - based on base_model in model_info by @ishaan-jaff in #1874

Why ?

How to use - set `base_model` on config.yaml

View Cost calculated on Langfuse

2. [Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID by @ishaan-jaff in #1878

3. [Feat] Slack Alert when budget tracking fails by @ishaan-jaff in #1877

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Semantic Caching Support - Add Semantic Caching to litellm💰 by @ishaan-jaff in #1829

Usage with Proxy

Step 1: Add `cache` to the config.yaml

Step 2: Add Redis Credentials to .env

Step 3: Run proxy with config

Usage with `litellm.completion`

Budgets for 'user' param passed to /chat/completions, /embeddings etc

How to Use

Contributors

Releases: BerriAI/litellm

v1.23.8

v1.23.7

1. Bedrock Set Timeouts

Usage - litellm.completion

Usage on Proxy config.yaml

2 View total proxy spend / budget

3. Use LlamaIndex with Proxy - Support azure deployments for /embeddings

Use LlamaIndex with LiteLLM Proxy

Contributors

v1.23.5

What's Changed

Contributors

v1.23.4

What's Changed

Contributors

v1.23.3

What's Changed

Contributors

v1.23.2

What's Changed 🐬

1. [FEAT] Azure Pricing - based on base_model in model_info by @ishaan-jaff in #1874

Why ?

How to use - set base_model on config.yaml

View Cost calculated on Langfuse

2. [Feat] Semantic Caching - Track Cost of using embedding, Use Langfuse Trace ID by @ishaan-jaff in #1878

3. [Feat] Slack Alert when budget tracking fails by @ishaan-jaff in #1877

Contributors

v1.23.1

What's Changed

Contributors

v1.23.0

What's Changed

Contributors

v1.22.11

v1.22.10

What's Changed

Semantic Caching Support - Add Semantic Caching to litellm💰 by @ishaan-jaff in #1829

Usage with Proxy

Step 1: Add cache to the config.yaml

Step 2: Add Redis Credentials to .env

Step 3: Run proxy with config

Usage with litellm.completion

Budgets for 'user' param passed to /chat/completions, /embeddings etc

How to Use

Contributors

How to use - set `base_model` on config.yaml

Step 1: Add `cache` to the config.yaml

Usage with `litellm.completion`