Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Structured Response should be parsable just like OpenAI SDK #7501

Open
hem210 opened this issue Jan 2, 2025 · 1 comment
Open
Labels
enhancement New feature or request mlops user request

Comments

@hem210
Copy link

hem210 commented Jan 2, 2025

The Feature

When generating a stream of Structured Output from OpenAI's SDK, the response comes parsable. Here is a format of the response where there is both the delta chunk coming from the llm along with the response till now accumulated in the pydantic model provided.

Below given is the format of response streamed by OpenAI:

ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-Al8DGCE', choices=[Choice(delta=ChoiceDelta(content='":["', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1735796350, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='{"attributes":["', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed={'attributes': []}), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))
ContentDeltaEvent(type='content.delta', delta='":["', snapshot='{"attributes":["', parsed={'attributes': []})
ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-Al8DGCE', choices=[Choice(delta=ChoiceDelta(content='quick', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1735796350, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='{"attributes":["quick', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed={'attributes': []}), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))
ContentDeltaEvent(type='content.delta', delta='quick', snapshot='{"attributes":["quick', parsed={'attributes': []})
ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-Al8DGCE', choices=[Choice(delta=ChoiceDelta(content='","', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1735796350, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='{"attributes":["quick","', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed={'attributes': ['quick']}), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))
ContentDeltaEvent(type='content.delta', delta='","', snapshot='{"attributes":["quick","', parsed={'attributes': ['quick']})
ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-Al8DGCE', choices=[Choice(delta=ChoiceDelta(content='brown', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1735796350, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='{"attributes":["quick","brown', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed={'attributes': ['quick']}), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))
ContentDeltaEvent(type='content.delta', delta='brown', snapshot='{"attributes":["quick","brown', parsed={'attributes': ['quick']})
ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-Al8DGCE', choices=[Choice(delta=ChoiceDelta(content='","', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1735796350, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='{"attributes":["quick","brown","', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed={'attributes': ['quick', 'brown']}), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))
ContentDeltaEvent(type='content.delta', delta='","', snapshot='{"attributes":["quick","brown","', parsed={'attributes': ['quick', 'brown']})
ChunkEvent(type='chunk', chunk=ChatCompletionChunk(id='chatcmpl-Al8DGCE', choices=[Choice(delta=ChoiceDelta(content='lazy', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None, content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1735796350, model='gpt-4o-mini-2024-07-18', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_51', usage=None), snapshot=ParsedChatCompletion[object](id='', choices=[ParsedChoice[object](finish_reason=None, index=0, logprobs=None, message=ParsedChatCompletionMessage[object](content='{"attributes":["quick","brown","lazy', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None, parsed={'attributes': ['quick', 'brown']}), content_filter_results={})], created=0, model='', object='chat.completion', service_tier=None, system_fingerprint='fp_51', usage=None, prompt_filter_results=[{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]))
ContentDeltaEvent(type='content.delta', delta='lazy', snapshot='{"attributes":["quick","brown","lazy', parsed={'attributes': ['quick', 'brown']})

Motivation, pitch

Here as you can see, we are getting both the delta of content (ChunkEvent.choices[0].delta.content) which is getting generated along with the parsable output till now (inside the ChunkEvent.parsed). This helps me in streaming the response as it comes in structured output. So I am working on a specific use case where I want to stream and parse the response as it comes in Structured Output (without waiting till its completion). Please add this feature to litellm.

Are you a ML Ops Team?

Yes

Twitter / LinkedIn details

No response

@hem210 hem210 added the enhancement New feature or request label Jan 2, 2025
@krrishdholakia
Copy link
Contributor

can you share your openai code / minimal script we can test against @hem210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mlops user request
Projects
None yet
Development

No branches or pull requests

2 participants