Skip to content

Commit

Permalink
feat(prometheus.py): emit time_to_first_token metric on prometheus
Browse files Browse the repository at this point in the history
Closes #6334
  • Loading branch information
krrishdholakia committed Oct 21, 2024
1 parent 6b82255 commit 214e412
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 5 deletions.
5 changes: 3 additions & 2 deletions docs/my-website/docs/proxy/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,9 @@ Use this for LLM API Error monitoring and tracking remaining rate limits and tok

| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_request_total_latency_metric` | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `litellm_call_id`, `model`, `user_api_key`, `user_api_key_alias`, `user_api_team`, `user_api_team_alias` |
| `litellm_llm_api_latency_metric` | Latency (seconds) for just the LLM API call - tracked for labels `litellm_call_id`, `model`, `user_api_key`, `user_api_key_alias`, `user_api_team`, `user_api_team_alias` |
| `litellm_request_total_latency_metric` | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` |
| `litellm_llm_api_latency_metric` | Latency (seconds) for just the LLM API call - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` |
| `litellm_llm_api_time_to_first_token_metric` | Time to first token for LLM API call - tracked for labels `model`, `hashed_api_key`, `api_key_alias`, `team`, `team_alias` |

## Virtual Key - Budget, Rate Limit Metrics

Expand Down
29 changes: 29 additions & 0 deletions litellm/integrations/prometheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,19 @@ def __init__(
buckets=LATENCY_BUCKETS,
)

self.litellm_llm_api_time_to_first_token_metric = Histogram(
"litellm_llm_api_time_to_first_token_metric",
"Time to first token for a models LLM API call",
labelnames=[
"model",
"hashed_api_key",
"api_key_alias",
"team",
"team_alias",
],
buckets=LATENCY_BUCKETS,
)

# Counter for spend
self.litellm_spend_metric = Counter(
"litellm_spend_metric",
Expand Down Expand Up @@ -468,6 +481,22 @@ async def async_log_success_event( # noqa: PLR0915
total_time_seconds = total_time.total_seconds()
api_call_start_time = kwargs.get("api_call_start_time", None)

completion_start_time = kwargs.get("completion_start_time", None)

if completion_start_time is not None and isinstance(
completion_start_time, datetime
):
time_to_first_token_seconds = (
completion_start_time - api_call_start_time
).total_seconds()
self.litellm_llm_api_time_to_first_token_metric.labels(
model,
user_api_key,
user_api_key_alias,
user_api_team,
user_api_team_alias,
).observe(time_to_first_token_seconds)

if api_call_start_time is not None and isinstance(
api_call_start_time, datetime
):
Expand Down
9 changes: 6 additions & 3 deletions litellm/proxy/_new_secret_config.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
model_list:
- model_name: jina-embedding
- model_name: gpt-3.5-turbo
litellm_params:
model: jina_ai/jina-embeddings-v3
api_key: jina_658322978426431b9fe41bd6b29563c1wJQ1JDqf13S7BdxA_RkaNfvc-Gdj
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY

litellm_settings:
callbacks: ["prometheus"]

0 comments on commit 214e412

Please sign in to comment.