Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(bug fix) SpendLogs update DB catch all possible DB errors for retrying #7082

Merged
merged 6 commits into from
Dec 7, 2024

Conversation

ishaan-jaff
Copy link
Contributor

@ishaan-jaff ishaan-jaff commented Dec 7, 2024

SpendLogs reliability improvements

  • Retry mechanism was catching except httpx.ReadTimeout:, users were seeing httpx.ConnectError during partial DB outages
  • Fixes exponential backoff for updating SpendLogs in DB
  • correctly raises error after num retries is crossed for exceptions

Relevant issues

Type

🐛 Bug Fix
✅ Test

Changes

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

If UI changes, send a screenshot/GIF of working UI fixes

  • Tests retry mechanism for various database connection errors (ConnectError, ReadError, ReadTimeout), verifying successful completion after multiple retries and proper transaction cleanup
  • Validates proper failure handling when maximum retry attempts are exceeded, ensuring correct error propagation and failure handler notification
  • Verifies immediate failure for non-connection related errors without retry attempts, confirming proper error handling and failure notification
  • Tests exponential backoff implementation between retry attempts, confirming correct delay intervals (1s, 2s) and successful completion
  • Validates processing of large batches of spend logs (150 logs in batches of 100), ensuring correct batch splitting, processing order, and complete log handling

Copy link

vercel bot commented Dec 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 7, 2024 10:16pm

Copy link

codecov bot commented Dec 7, 2024

Codecov Report

Attention: Patch coverage is 51.35135% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/proxy/utils.py 47.05% 18 Missing ⚠️

📢 Thoughts on this report? Let us know!

@BerriAI BerriAI deleted a comment from delve-auditor bot Dec 7, 2024
@BerriAI BerriAI deleted a comment from delve-auditor bot Dec 7, 2024
@BerriAI BerriAI deleted a comment from delve-auditor bot Dec 7, 2024
@ishaan-jaff ishaan-jaff merged commit c33cebb into main Dec 7, 2024
26 of 28 checks passed
krrishdholakia pushed a commit that referenced this pull request Dec 8, 2024
…ng (#7082)

* catch DB_CONNECTION_ERROR_TYPES

* fix DB retry mechanism for SpendLog updates

* use DB_CONNECTION_ERROR_TYPES in auth checks

* fix exp back off for writing SpendLogs

* use _raise_failed_update_spend_exception to ensure errors print as NON blocking

* test_update_spend_logs_multiple_batches_with_failure
krrishdholakia added a commit that referenced this pull request Dec 8, 2024
* fix(main.py): support passing max retries to azure/openai embedding integrations

Fixes #7003

* feat(team_endpoints.py): allow updating team model aliases

Closes #6956

* feat(router.py): allow specifying model id as fallback - skips any cooldown check

Allows a default model to be checked if all models in cooldown

s/o @micahjsmith

* docs(reliability.md): add fallback to specific model to docs

* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util

Allows user to identify if messages/tools have prompt caching

Related issue: #6784

* feat(router.py): store model id for prompt caching valid prompt

Allows routing to that model id on subsequent requests

* fix(router.py): only cache if prompt is valid prompt caching prompt

prevents storing unnecessary items in cache

* feat(router.py): support routing prompt caching enabled models to previous deployments

Closes #6784

* test: fix linting errors

* feat(databricks/): convert basemodel to dict and exclude none values

allow passing pydantic message to databricks

* fix(utils.py): ensure all chat completion messages are dict

* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)

* add custom_llm_provider to SpendLogsPayload

* add custom_llm_provider to SpendLogs

* add custom llm provider to SpendLogs payload

* test_spend_logs_payload

* Add MLflow to the side bar (#7031)

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* (bug fix) SpendLogs update DB catch all possible DB errors for retrying  (#7082)

* catch DB_CONNECTION_ERROR_TYPES

* fix DB retry mechanism for SpendLog updates

* use DB_CONNECTION_ERROR_TYPES in auth checks

* fix exp back off for writing SpendLogs

* use _raise_failed_update_spend_exception to ensure errors print as NON blocking

* test_update_spend_logs_multiple_batches_with_failure

* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)

* fix model cost map fireworks ai "supports_response_schema": true,

* fix supports_response_schema

* fix map openai params fireworks ai

* test_map_response_format

* test_map_response_format

* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)

* bump: version 1.53.9 → 1.54.0

* fix deepinfra

* litellm db fixes LiteLLM_UserTable (#7089)

* ci/cd queue new release

* fix llama-3.3-70b-versatile

* refactor - use consistent file naming convention `AI21/` -> `ai21`  (#7090)

* fix refactor - use consistent file naming convention

* ci/cd run again

* fix naming structure

* fix use consistent naming (#7092)

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
rajatvig pushed a commit to rajatvig/litellm that referenced this pull request Jan 16, 2025
…ng (BerriAI#7082)

* catch DB_CONNECTION_ERROR_TYPES

* fix DB retry mechanism for SpendLog updates

* use DB_CONNECTION_ERROR_TYPES in auth checks

* fix exp back off for writing SpendLogs

* use _raise_failed_update_spend_exception to ensure errors print as NON blocking

* test_update_spend_logs_multiple_batches_with_failure
rajatvig pushed a commit to rajatvig/litellm that referenced this pull request Jan 16, 2025
* fix(main.py): support passing max retries to azure/openai embedding integrations

Fixes BerriAI#7003

* feat(team_endpoints.py): allow updating team model aliases

Closes BerriAI#6956

* feat(router.py): allow specifying model id as fallback - skips any cooldown check

Allows a default model to be checked if all models in cooldown

s/o @micahjsmith

* docs(reliability.md): add fallback to specific model to docs

* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util

Allows user to identify if messages/tools have prompt caching

Related issue: BerriAI#6784

* feat(router.py): store model id for prompt caching valid prompt

Allows routing to that model id on subsequent requests

* fix(router.py): only cache if prompt is valid prompt caching prompt

prevents storing unnecessary items in cache

* feat(router.py): support routing prompt caching enabled models to previous deployments

Closes BerriAI#6784

* test: fix linting errors

* feat(databricks/): convert basemodel to dict and exclude none values

allow passing pydantic message to databricks

* fix(utils.py): ensure all chat completion messages are dict

* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (BerriAI#7081)

* add custom_llm_provider to SpendLogsPayload

* add custom_llm_provider to SpendLogs

* add custom llm provider to SpendLogs payload

* test_spend_logs_payload

* Add MLflow to the side bar (BerriAI#7031)

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* (bug fix) SpendLogs update DB catch all possible DB errors for retrying  (BerriAI#7082)

* catch DB_CONNECTION_ERROR_TYPES

* fix DB retry mechanism for SpendLog updates

* use DB_CONNECTION_ERROR_TYPES in auth checks

* fix exp back off for writing SpendLogs

* use _raise_failed_update_spend_exception to ensure errors print as NON blocking

* test_update_spend_logs_multiple_batches_with_failure

* (Feat) Add StructuredOutputs support for Fireworks.AI (BerriAI#7085)

* fix model cost map fireworks ai "supports_response_schema": true,

* fix supports_response_schema

* fix map openai params fireworks ai

* test_map_response_format

* test_map_response_format

* added deepinfra/Meta-Llama-3.1-405B-Instruct (BerriAI#7084)

* bump: version 1.53.9 → 1.54.0

* fix deepinfra

* litellm db fixes LiteLLM_UserTable (BerriAI#7089)

* ci/cd queue new release

* fix llama-3.3-70b-versatile

* refactor - use consistent file naming convention `AI21/` -> `ai21`  (BerriAI#7090)

* fix refactor - use consistent file naming convention

* ci/cd run again

* fix naming structure

* fix use consistent naming (BerriAI#7092)

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant