[Bug]: Spend logs are not saved correctly when using stream=true with stream_options: { include_usage: true } #6633

DamnDaniel7 · 2024-11-07T12:04:55Z

What happened?

Description

When making requests with stream=true and stream_options: { include_usage: true } enabled, the request_id column in the spend logs is empty, causing only the first request to be saved. Subsequent requests fail to be logged correctly due to request_id being a primary key. However, when making requests with only stream=true (without stream_options: { include_usage: true }), the spend logs are saved correctly, with request_id populated as expected.

Steps to Reproduce

Enable the spend tracking feature that saves spend logs to a database.
Make a request with the following parameters:
- stream=true
- stream_options: { include_usage: true }
Observe the spend logs saved in the database.

Expected Result:

Each successful request should have a unique request_id populated and logged in the spend logs.

Actual Result:

Only the first request is saved in the spend logs. The request_id is empty for subsequent requests, preventing additional entries due to it being the primary key.

Environment

Litellm version: v1.50.2-stable
Database type and version: PostgreSQL 14

config.yaml

Example of config.yaml used:

model_list:
        - model_name: gpt4o
          litellm_params:
            model: azure/gpt-4o
            api_base: <api_base>
            api_key: <api_key>
            api_version: 2024-09-01-preview
          model_info:
            base_model: azure/gpt-4o
        - model_name: gpt4o-mini
          litellm_params:
            model: azure/gpt-4o-mini
            api_base: <api_base>
            api_key: <api_key>
            api_version: 2024-09-01-preview
          model_info:
            base_model: azure/gpt-4o-mini

Relevant log output

22:51:02 - LiteLLM Proxy:DEBUG: proxy_server.py:3273 - Request received by LiteLLM:
{
    "model": "gpt4o-mini",
    "user": "user123",
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "messages": [
        {
            "role": "system",
            "content": "You are a snarky assistant."
        },
        {
            "role": "user",
            "content": "tell me a joke"
        }
    ]
}
...
22:51:04 - LiteLLM Proxy:DEBUG: proxy_server.py:757 - INSIDE _PROXY_track_cost_callback
22:51:04 - LiteLLM Proxy:DEBUG: proxy_server.py:761 - Proxy: In track_cost_callback for: kwargs={'model': 'gpt-4o-mini', 'messages': [{'role': 'system', 'content': 'You are a snarky assistant.'}, {'role': 'user', 'content': 'tell me a joke'}], 'optional_params': {'stream': True, 'stream_options': {'include_usage': True}, 'user': '[REDACTED_USER]', 'extra_body': {}}, 'litellm_params': {'acompletion': True, 'api_key': '[REDACTED_API_KEY]', 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'azure', 'api_base': '[REDACTED_API_BASE]', 'litellm_call_id': '[REDACTED_CALL_ID]', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': '[REDACTED]', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': '[REDACTED_USER_ID]', 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key': '[REDACTED]', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.51.3', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_metadata': {}, 'headers': {'content-type': 'application/json', 'user-agent': '[REDACTED]', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '[REDACTED_HOST]', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '348'}, 'endpoint': '[REDACTED_ENDPOINT]', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'gpt4o-mini', 'model_group_size': 1, 'deployment': 'azure/gpt-4o-mini', 'model_info': {'id': '[REDACTED_MODEL_ID]', 'db_model': False, 'base_model': 'azure/gpt-4o-mini'}, 'api_base': '[REDACTED_API_BASE]', 'caching_groups': None}, 'model_info': {'id': '[REDACTED_MODEL_ID]', 'db_model': False, 'base_model': 'azure/gpt-4o-mini'}, 'proxy_server_request': {'url': '[REDACTED_ENDPOINT]', 'method': 'POST', 'headers': {'content-type': 'application/json', 'user-agent': '[REDACTED]', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '[REDACTED_HOST]', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '348'}, 'body': {'model': 'gpt4o-mini', 'user': '[REDACTED_USER]', 'stream': True, 'stream_options': {'include_usage': True}, 'messages': [{'role': 'system', 'content': 'You are a snarky assistant.'}, {'role': 'user', 'content': 'tell me a joke'}]}}, 'preset_cache_key': '[REDACTED_CACHE_KEY]', 'no-log': False, 'stream_response': {}, 'input_cost_per_token': None, 'input_cost_per_second': None, 'output_cost_per_token': None, 'output_cost_per_second': None, 'cooldown_time': None, 'text_completion': None, 'azure_ad_token_provider': None, 'user_continue_message': None, 'base_model': 'azure/gpt-4o-mini'}, 'start_time': datetime.datetime(2024, 11, 6, 22, 51, 2, 861890), 'stream': True, 'user': '[REDACTED_USER]', 'call_type': 'acompletion', 'litellm_call_id': '[REDACTED_CALL_ID]', 'completion_start_time': datetime.datetime(2024, 11, 6, 22, 51, 3, 691795), 'standard_callback_dynamic_params': {}, 'stream_options': {'include_usage': True}, 'extra_body': {}, 'custom_llm_provider': 'azure', 'input': [{'role': 'system', 'content': 'You are a snarky assistant.'}, {'role': 'user', 'content': 'tell me a joke'}], 'api_key': '[REDACTED_API_KEY]', 'original_response': <coroutine object AzureChatCompletion.async_streaming at [REDACTED_MEMORY_ADDRESS]>, 'additional_args': {'headers': {'api_key': '[REDACTED_API_KEY]', 'azure_ad_token': None}, 'api_base': ParseResult(scheme='https', userinfo='', host='[REDACTED_HOST]', port=None, path='/openai/', query=None, fragment=None), 'acompletion': True, 'complete_input_dict': {'model': 'gpt4o-mini', 'messages': [{'role': 'system', 'content': 'You are a snarky assistant.'}, {'role': 'user', 'content': 'tell me a joke'}], 'stream': True, 'stream_options': {'include_usage': True}, 'user': '[REDACTED_USER]', 'extra_body': {}}}, 'log_event_type': 'post_api_call', 'api_call_start_time': datetime.datetime(2024, 11, 6, 22, 51, 2, 871645), 'response_headers': {'transfer-encoding': 'chunked', 'content-type': 'text/event-stream; charset=utf-8', 'apim-request-id': '[REDACTED_REQUEST_ID]', 'strict-transport-security': 'max-age=31536000; includeSubDomains; preload', 'x-ratelimit-remaining-requests': '9998', 'x-accel-buffering': 'no', 'x-ms-rai-invoked': 'true', 'x-request-id': '[REDACTED_REQUEST_ID]', 'x-content-type-options': 'nosniff', 'azureml-model-session': '[REDACTED_SESSION]', 'x-ms-region': 'East US', 'x-envoy-upstream-service-time': '109', 'x-ms-client-request-id': '[REDACTED_CLIENT_REQUEST_ID]', 'x-ratelimit-remaining-tokens': '998699', 'date': 'Wed, 06 Nov 2024 22:51:02 GMT'}, 'end_time': datetime.datetime(2024, 11, 6, 22, 51, 4, 463244), 'cache_hit': False, 'response_cost': [REDACTED_COST], 'complete_streaming_response': ModelResponse(id='', choices=[Choices(finish_reason='stop', index=0, message=Message(content='Ah, ...?', role='assistant', tool_calls=None, function_call=None))], created=[REDACTED_TIMESTAMP], model='', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=37, prompt_tokens=24, total_tokens=61, completion_tokens_details=None, prompt_tokens_details=None))}
22:51:04 - LiteLLM Proxy:DEBUG: proxy_server.py:766 - kwargs stream: True + complete streaming response: ModelResponse(id='', choices=[Choices(finish_reason='stop', index=0, message=Message(content='Ah, ...', tool_calls=None, function_call=None))], created=[REDACTED_TIMESTAMP], model='', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=37, prompt_tokens=24, total_tokens=61, completion_tokens_details=None, prompt_tokens_details=None))}
22:51:04 - LiteLLM Proxy:DEBUG: proxy_server.py:788 - user_api_key [REDACTED], prisma_client: <litellm.proxy.utils.PrismaClient object at [REDACTED_MEMORY_ADDRESS]>
22:51:04 - LiteLLM Proxy:DEBUG: proxy_server.py:893 - Enters prisma db call, response_cost: [REDACTED_COST], token: [REDACTED]; user_id: [REDACTED_USER_ID]; team_id: None
22:51:04 - LiteLLM Proxy:DEBUG: spend_tracking_utils.py:40 - SpendTable: get_logging_payload - kwargs: {'model': 'gpt-4o-mini', 'messages': [{'role': 'system', 'content': 'You are a snarky assistant.'}, {'role': 'user', 'content': 'tell me a joke'}], 'optional_params': {'stream': True, 'stream_options': {'include_usage': True}, 'user': '[REDACTED_USER]', 'extra_body': {}}, 'litellm_params': {'acompletion': True, 'api_key': '[REDACTED_API_KEY]', 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'azure', 'api_base': '[REDACTED_API_BASE]', 'litellm_call_id': '[REDACTED_CALL_ID]', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': {'requester_metadata': {}, 'user_api_key_hash': '[REDACTED]', 'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': '[REDACTED_USER_ID]', 'user_api_key_org_id': None, 'user_api_key_team_alias': None, 'user_api_key': '[REDACTED]', 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.51.3', 'global_max_parallel_requests': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_metadata': {}, 'headers': {'content-type': 'application/json', 'user-agent': '[REDACTED]', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '[REDACTED_HOST]', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '348'}, 'endpoint': '[REDACTED_ENDPOINT]', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'gpt4o-mini', 'model_group_size': 1, 'deployment': 'azure/gpt-4o-mini', 'model_info': {'id': '[REDACTED_MODEL_ID]', 'db_model': False, 'base_model': 'azure/gpt-4o-mini'}, 'api_base': '[REDACTED_API_BASE]', 'caching_groups': None}, 'model_info': {'id': '[REDACTED_MODEL_ID]', 'db_model': False, 'base_model': 'azure/gpt-4o-mini'}, 'proxy_server_request': {'url': '[REDACTED_ENDPOINT]', 'method': 'POST', 'headers': {'content-type': 'application/json', 'user-agent': '[REDACTED]', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '[REDACTED_HOST]', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '348'}, 'body': {'model': 'gpt4o-mini', 'user': '[REDACTED_USER]', 'stream': True, 'stream_options': {'include_usage': True}, 'messages': [{'role': 'system', 'content': 'You are a snarky assistant.'}, {'role': 'user', 'content': 'tell me a joke'}]}}, 'preset_cache_key': '[REDACTED_CACHE_KEY]', 'no-log': False, 'stream_response': {}, 'input_cost_per_token': None, 'input_cost_per_second': None, 'output_cost_per_token': None, 'output_cost_per_second': None, 'cooldown_time': None, 'text_completion': None, 'azure_ad_token_provider': None, 'user_continue_message': None, 'base_model': 'azure/gpt-4o-mini'}}
22:51:04 - LiteLLM Proxy:DEBUG: spend_tracking_utils.py:48 - SpendTable: get_logging_payload - user_id: [REDACTED_USER_ID], team_id: None
22:51:04 - LiteLLM Proxy:DEBUG: spend_tracking_utils.py:84 - SpendTable: PrismaSession: User[REDACTED_USER_ID] - Prisma Tracking Table Creation Attempt.

Twitter / LinkedIn details

No response

The text was updated successfully, but these errors were encountered:

DamnDaniel7 added the bug Something isn't working label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Spend logs are not saved correctly when using stream=true with stream_options: { include_usage: true } #6633

[Bug]: Spend logs are not saved correctly when using stream=true with stream_options: { include_usage: true } #6633

DamnDaniel7 commented Nov 7, 2024

[Bug]: Spend logs are not saved correctly when using stream=true with stream_options: { include_usage: true } #6633

[Bug]: Spend logs are not saved correctly when using stream=true with stream_options: { include_usage: true } #6633

Comments

DamnDaniel7 commented Nov 7, 2024

What happened?

Description

Steps to Reproduce

Environment

config.yaml

Relevant log output

Twitter / LinkedIn details