Description
Confirm this is a feature request for the Python library and not the underlying OpenAI API.
- This is a feature request for the Python library
Describe the feature or improvement you're requesting
Situation
When calling AsyncOpenAI(...).beta.chat.completions.parse(..., response_format=SomePydanticModel)
, the OpenAI library raises LengthFinishReasonError
when finish_reason == "length"
and raises ContentFilterFinishReasonError
when finish_reason == "content_filter"
, without providing any information as to what the response contained.
Complication
Because there is no way to retrieve any information about the response, I cannot programmatically save information about the context. For example, I cannot access and track information from the usage
object in the chat completion response.
Desired behavior
As a library user, I always want to know details about responses from LLM calls that costs tokens for me. More specifically, I want to inspect usage
to know how many tokens I "wasted" calling the LLM, for instance when max_tokens
was set to a too low value for the LLM to generate a complete structured output. This enables me to track and control costs.
I see two potential solutions:
- Stop raising exceptions for these scenarios and always return a chat completion object. I believe this is the behavior in the non-beta version of the chat completion call
- Return the response as an attribute in the exception object so that it can be used by the calling programmer
Version used
- 1.44.1 (latest on PyPI at the time of writing)
Code location
- File and line:
openai.lib._parsing._completions.py
on lines 71-75 - Function:
parse_chat_completion
Additional context
No response