-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(responses)!: improve responses + conversations implementations #3810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.)
llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py
Outdated
Show resolved
Hide resolved
llama_stack/providers/inline/agents/meta_reference/responses/openai_responses.py
Outdated
Show resolved
Hide resolved
af111e9
to
85f7ec0
Compare
| | ||
# Fallback to the generic message type as a last resort | ||
OpenAIResponseMessage, | ||
| OpenAIResponseOutputMessageMCPCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to the change I'm proposing at #3385, but there I'm adding diractly OpenAIResponseOutput as a possible input, to always keep both list in sync (based on a comment I received in that PR).
As the other PR is simpler/shorter, perhaps we can go with it and rebase this on top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luis5tb It is actually far more complex, even my changes are only a patch. We need to do a thorough review of everything carefully. My changes here were not motivated by trying to override your PR but came via an independent motivation. Unfortunate your PR went languishing for a long time. I also need to run tests still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luis5tb in fact I went your route first but then looked at OpenAI's definitions. They are very subtly different! It may be you are right but we need to do an automatic thorough check on all Responses type definitions -- there are way too many holes there right now.
@github-actions run precommit |
⏳ Running pre-commit hooks on PR #3810... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass and nothing caught my eyes. Thanks!
✅ Pre-commit hooks completed successfully! 🔧 Changes have been committed and pushed to the PR branch. |
| OpenAIResponseOutputMessageFunctionToolCall | ||
| OpenAIResponseOutputMessageFileSearchToolCall | ||
| OpenAIResponseOutputMessageWebSearchToolCall | ||
| OpenAIResponseOutputMessageFileSearchToolCall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice i was planning to follow up on these after the latest changes. thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
@leseb wtf how did that pre-commit magic work man! This is in my forked Repo! |
442399f
to
57b3d14
Compare
Still fixing some responses tests. Not broken by this change honestly, but still (since these things are not in CI yet)... |
Note that CI will appear red on this PR because llamastack/llama-stack-client-python#281 needs to be landed concurrently. |
messages = await convert_response_input_to_chat_messages(input) | ||
else: | ||
# Use stored messages directly and convert only new input | ||
messages = stored_messages or [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can stored_messages
actually be None here?
filter(lambda x: not isinstance(x, OpenAISystemMessageParam), orchestrator.final_messages) | ||
) | ||
if store: | ||
# TODO: we really should work off of output_items instead of "final_messages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this still apply?
followup on llamastack#3810 Signed-off-by: Sébastien Han <seb@redhat.com>
followup on llamastack#3810 Signed-off-by: Sébastien Han <seb@redhat.com>
followup on #3810 Signed-off-by: Sébastien Han <seb@redhat.com>
This PR updates the Conversation item related types and improves a
couple critical parts of the implemenation:
it creates a streaming output item for the final assistant message output by
the model. until now we only added content parts and included that
message in the final response.
rewrites the conversation update code completely to account for items
other than messages (tool calls, outputs, etc.)
Test Plan
Used the test script from llamastack/llama-stack-client-python#281 for this