feat: OpenAI Responses API #1989

bbrowning · 2025-04-17T18:52:35Z

What does this PR do?

This provides an initial OpenAI Responses API implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like previous_response_id.

Test Plan

I've added a new tests/integration/openai_responses/test_openai_responses.py as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the openai_chat_completion endpoint.

VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack build --template remote-vllm --image-type venv --run

LLAMA_STACK_CONFIG="http://localhost:8321" \
python -m pytest -v \
  tests/integration/openai_responses/test_openai_responses.py \
  --text-model "meta-llama/Llama-3.2-3B-Instruct"

bbrowning · 2025-04-18T20:04:12Z

Status as of April 18

I've moved the tests for the OpenAI Responses API over to tests/verifications/openai_api/test_response.py so I can start testing this in a similar way that we test the OpenAI Chat Completions API. What's currently tested are basic streaming and non-streaming responses, multi-turn non-streaming responses using previous_response_id to pass context server-side between turns, non-streaming tool calling using the built-in web search tool, and non-streaming multi-turn image combined with web search tool calling to perform web searches based on the contents of an image, again passing context server-side between the turns instead of managing client-side. Look at the test file mentioned above for full examples.

Testing

To test this locally, ensure you have OPENAI_API_KEY and TAVILY_SEARCH_API_KEY setup, as well as any additional inference provider API keys you want to test such as TOGETHER_API_KEY for the together provider. Then, start the Llama Stack server configured for verification tests:

llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml

Then, run the specific provider verification tests you want. Examples that I've run are below, along with their pass/fail rate.

OpenAI - 100% Passing

This hits the hosted API on openai.com directly - ie not using Llama Stack. This is to ensure our verification tests are passing against an implementation besides Llama Stack.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=openai

OpenAI via Llama Stack - 100% Passing

This uses Llama Stack's implementation of the OpenAI Responses API, handing the actual inference off to OpenAI as the backend inference provider. An important distinction here is that Llama Stack is doing all the Responses API handling, and we're only using the Chat Completions API against the backend inference provider, which in this case is OpenAI. This ensures we're handling nuances of the Responses API properly, by isolating any differences introduced by using other inference providers or models.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=openai-llama-stack

together.ai via Llama Stack - 100% passing

Now the tests get more interesting, as we start testing against Llama 3.3 and Llama 4 model variants using a backend inference provider that has no native OpenAI Responses API support.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=together-llama-stack

Fireworks AI via Llama Stack - 77% passing

This also uses Llama 3.3 and Llama 4 models. The web search tool tests are failing with this one, which appears to be something with the way tool calls are being constructed by the backend inference service.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=fireworks-llama-stack

Untested and/or not working yet

File search builtin tool is not implemented yet (will require Files API)
Some of the current testing permutations are not testing the streaming variants yet. There are likely to be bugs there.
Custom function calling (as opposed to the built-in tools) is not tested yet, and likely needs a small bit of code to wire things up to get those working. The expectation is that custom tools, including ones via MCP, will work with the Responses API.
The OpenAI builtin computer use tool is not implemented and I don't have immediate plans to tackle that one.
Audio is not implemented yet, and not a high priority for my initial work.
Probably a lot of other things I'm forgetting - some TODO comments in the code, as well as some general cleanup / refactoring as I hack things together to get them working and then later clean up those hacks into maintainable code.

llama_stack/apis/openai_responses/openai_responses.py

Signed-off-by: Ben Browning <bbrownin@redhat.com>

This extracts out a helper message to convert previous responses to messages and to convert openai choices (from a chat completion response) into output messages for the OpenAI Responses output. Signed-off-by: Ben Browning <bbrownin@redhat.com>

Signed-off-by: Ben Browning <bbrownin@redhat.com>

This moves the OpenAI Responses API tests under tests/verifications/openai_api/test_response.py and starts to wire them up to our verification suite, so that we can test multiple providers as well as OpenAI directly for the Responses API. Signed-off-by: Ben Browning <bbrownin@redhat.com>

ashwinb · 2025-04-28T20:40:49Z

I verified that:

pytest -s -v test_responses.py --provider=fireworks-llama-stack

worked for all the 3 models. There was a small issue around tool calling which I fixed.

ashwinb · 2025-04-28T20:41:37Z

Given that this represents a major advance (even though the implementation is not complete or robust) in terms of what the Stack can offer, I am landing it :) And also want to avoid conflicts!! :)

bbrowning · 2025-04-28T21:28:19Z

Nice, thanks for the fixes and landing this! I'll work on setting up some nightly testing for this across lots of our providers so we have a baseline to track and expand compatibility with over time.

bbrowning · 2025-04-28T21:42:26Z

I ran the new OpenAI Responses API verification tests from main immediately after this PR landed, and here's the current status of the other providers I tested:

OpenAI provider passed 16/16 (100%) of tests across their gpt-4o and gpt-4o-mini models.
Together provider passed 22/22 (100%) of tests across their meta-llama/Llama-3.3-70B-Instruct-Turbo, meta-llama/Llama-4-Scout-17B-16E-Instruct, and meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 models.
Groq provider passed 20/22 (91%) of tests across their llama-3.3-70b-versatile, meta-llama/llama-4-scout-17b-16e-instruct, and meta-llama/llama-4-maverick-17b-128e-instruct models.

It will take me a bit longer to get some nightly testing setup for this and to get some manual test results using ollama and remote-vllm providers. But, the changes look good across the hosted providers generally so far.

client-side support for the in-progress API surface: llamastack/llama-stack#1989

ashwinb · 2025-04-29T04:45:04Z

The expectation is that custom tools, including ones via MCP, will work with the Responses API.

MCP can work transparently via server side execution. But custom tools need to come back as a tool call to the client, right?

bbrowning · 2025-04-29T11:20:02Z

The expectation is that custom tools, including ones via MCP, will work with the Responses API.

MCP can work transparently via server side execution. But custom tools need to come back as a tool call to the client, right?

Yes, custom tools will come back to the client as a tool call - ie https://platform.openai.com/docs/guides/function-calling?api-mode=responses . We will have a decision to make (or expose the decision to the person running the Llama Stack server) as to which (if any) MCP tools we invoke automatically via server-side execution and which we send back to the client. In other words, we may choose to allow the person configuring a Stack to extend the list of "builtin" tools that automatically get called from an OpenAI Responses API point-of-view to include some MCP tools.

# What does this PR do? This provides an initial [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses) implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like `previous_response_id`. ## Test Plan I've added a new `tests/integration/openai_responses/test_openai_responses.py` as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the `openai_chat_completion` endpoint. ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack build --template remote-vllm --image-type venv --run ``` ``` LLAMA_STACK_CONFIG="http://localhost:8321" \ python -m pytest -v \ tests/integration/openai_responses/test_openai_responses.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 17, 2025

bbrowning changed the title ~~Stub in an initial OpenAI Responses API~~ feat: OpenAI Responses API Apr 17, 2025

bbrowning force-pushed the openai-responses-api branch from 2608b8f to 5230dac Compare April 18, 2025 19:36

ashwinb reviewed Apr 25, 2025

View reviewed changes

llama_stack/apis/openai_responses/openai_responses.py Outdated Show resolved Hide resolved

ashwinb marked this pull request as ready for review April 25, 2025 17:22

ashwinb requested review from SLR722, dineshyv, dltn, ehhuang, hardikjshah, leseb, raghotham, sixianyi0721, terrytangyuan, vladimirivic and yanxi0830 as code owners April 25, 2025 17:22

bbrowning and others added 8 commits April 28, 2025 10:37

Stub in an initial OpenAI Responses API

70c088a

Signed-off-by: Ben Browning <bbrownin@redhat.com>

OpenAI Responses API: Stub in basic web_search tool

35b2e26

OpenAI Responses - image support and multi-turn tool calling

d523c86

Signed-off-by: Ben Browning <bbrownin@redhat.com>

OpenAI Responses - streaming handling for text chat responses

591e6a3

Signed-off-by: Ben Browning <bbrownin@redhat.com>

fold openai responses into the Agents API

abd6280

raise when you find a Literal type we dont support in openapi generator

78da660

ashwinb force-pushed the openai-responses-api branch from 70f726f to 78da660 Compare April 28, 2025 17:37

ashwinb added 3 commits April 28, 2025 10:46

rename response to responses in verifications, update provider

ae012bb

update the run.yaml

a152439

fix tool calling by not relying on finish reason but tool_calls

7323d8e

ashwinb approved these changes Apr 28, 2025

View reviewed changes

shethaadit approved these changes Apr 28, 2025

View reviewed changes

ashwinb merged commit 8dfce2f into llamastack:main Apr 28, 2025
37 of 39 checks passed

ashwinb mentioned this pull request Apr 28, 2025

feat: client.responses.create() and client.responses.retrieve() llamastack/llama-stack-client-python#227

Merged

ashwinb added a commit to llamastack/llama-stack-client-python that referenced this pull request Apr 28, 2025

feat: client.responses.create() and client.responses.retrieve() (#227)

fba5102

client-side support for the in-progress API surface: llamastack/llama-stack#1989

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: OpenAI Responses API #1989

feat: OpenAI Responses API #1989

Uh oh!

bbrowning commented Apr 17, 2025

Uh oh!

bbrowning commented Apr 18, 2025

Uh oh!

Uh oh!

ashwinb commented Apr 28, 2025

Uh oh!

ashwinb commented Apr 28, 2025

Uh oh!

Uh oh!

bbrowning commented Apr 28, 2025

Uh oh!

bbrowning commented Apr 28, 2025

Uh oh!

ashwinb commented Apr 29, 2025

Uh oh!

bbrowning commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: OpenAI Responses API #1989

feat: OpenAI Responses API #1989

Uh oh!

Conversation

bbrowning commented Apr 17, 2025

What does this PR do?

Test Plan

Uh oh!

bbrowning commented Apr 18, 2025

Status as of April 18

Testing

OpenAI - 100% Passing

OpenAI via Llama Stack - 100% Passing

together.ai via Llama Stack - 100% passing

Fireworks AI via Llama Stack - 77% passing

Untested and/or not working yet

Uh oh!

Uh oh!

ashwinb commented Apr 28, 2025

Uh oh!

ashwinb commented Apr 28, 2025

Uh oh!

Uh oh!

bbrowning commented Apr 28, 2025

Uh oh!

bbrowning commented Apr 28, 2025

Uh oh!

ashwinb commented Apr 29, 2025

Uh oh!

bbrowning commented Apr 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants