Skip to content

Conversation

bbrowning
Copy link
Collaborator

What does this PR do?

This provides an initial OpenAI Responses API implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like previous_response_id.

Test Plan

I've added a new tests/integration/openai_responses/test_openai_responses.py as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the openai_chat_completion endpoint.

VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack build --template remote-vllm --image-type venv --run
LLAMA_STACK_CONFIG="http://localhost:8321" \
python -m pytest -v \
  tests/integration/openai_responses/test_openai_responses.py \
  --text-model "meta-llama/Llama-3.2-3B-Instruct"

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 17, 2025
@bbrowning bbrowning changed the title Stub in an initial OpenAI Responses API feat: OpenAI Responses API Apr 17, 2025
@bbrowning bbrowning force-pushed the openai-responses-api branch from 2608b8f to 5230dac Compare April 18, 2025 19:36
@bbrowning
Copy link
Collaborator Author

Status as of April 18

I've moved the tests for the OpenAI Responses API over to tests/verifications/openai_api/test_response.py so I can start testing this in a similar way that we test the OpenAI Chat Completions API. What's currently tested are basic streaming and non-streaming responses, multi-turn non-streaming responses using previous_response_id to pass context server-side between turns, non-streaming tool calling using the built-in web search tool, and non-streaming multi-turn image combined with web search tool calling to perform web searches based on the contents of an image, again passing context server-side between the turns instead of managing client-side. Look at the test file mentioned above for full examples.

Testing

To test this locally, ensure you have OPENAI_API_KEY and TAVILY_SEARCH_API_KEY setup, as well as any additional inference provider API keys you want to test such as TOGETHER_API_KEY for the together provider. Then, start the Llama Stack server configured for verification tests:

llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml

Then, run the specific provider verification tests you want. Examples that I've run are below, along with their pass/fail rate.

OpenAI - 100% Passing

This hits the hosted API on openai.com directly - ie not using Llama Stack. This is to ensure our verification tests are passing against an implementation besides Llama Stack.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=openai

OpenAI via Llama Stack - 100% Passing

This uses Llama Stack's implementation of the OpenAI Responses API, handing the actual inference off to OpenAI as the backend inference provider. An important distinction here is that Llama Stack is doing all the Responses API handling, and we're only using the Chat Completions API against the backend inference provider, which in this case is OpenAI. This ensures we're handling nuances of the Responses API properly, by isolating any differences introduced by using other inference providers or models.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=openai-llama-stack

together.ai via Llama Stack - 100% passing

Now the tests get more interesting, as we start testing against Llama 3.3 and Llama 4 model variants using a backend inference provider that has no native OpenAI Responses API support.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=together-llama-stack

Fireworks AI via Llama Stack - 77% passing

This also uses Llama 3.3 and Llama 4 models. The web search tool tests are failing with this one, which appears to be something with the way tool calls are being constructed by the backend inference service.

python -m pytest -s -v tests/verifications/openai_api/test_response.py --provider=fireworks-llama-stack

Untested and/or not working yet

  • File search builtin tool is not implemented yet (will require Files API)
  • Some of the current testing permutations are not testing the streaming variants yet. There are likely to be bugs there.
  • Custom function calling (as opposed to the built-in tools) is not tested yet, and likely needs a small bit of code to wire things up to get those working. The expectation is that custom tools, including ones via MCP, will work with the Responses API.
  • The OpenAI builtin computer use tool is not implemented and I don't have immediate plans to tackle that one.
  • Audio is not implemented yet, and not a high priority for my initial work.
  • Probably a lot of other things I'm forgetting - some TODO comments in the code, as well as some general cleanup / refactoring as I hack things together to get them working and then later clean up those hacks into maintainable code.

bbrowning and others added 8 commits April 28, 2025 10:37
Signed-off-by: Ben Browning <bbrownin@redhat.com>
This extracts out a helper message to convert previous responses to
messages and to convert openai choices (from a chat completion
response) into output messages for the OpenAI Responses output.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
This moves the OpenAI Responses API tests under
tests/verifications/openai_api/test_response.py and starts to wire
them up to our verification suite, so that we can test multiple
providers as well as OpenAI directly for the Responses API.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
@ashwinb ashwinb force-pushed the openai-responses-api branch from 70f726f to 78da660 Compare April 28, 2025 17:37
@ashwinb
Copy link
Contributor

ashwinb commented Apr 28, 2025

I verified that:

pytest -s -v test_responses.py --provider=fireworks-llama-stack

worked for all the 3 models. There was a small issue around tool calling which I fixed.

@ashwinb
Copy link
Contributor

ashwinb commented Apr 28, 2025

Given that this represents a major advance (even though the implementation is not complete or robust) in terms of what the Stack can offer, I am landing it :) And also want to avoid conflicts!! :)

@ashwinb ashwinb merged commit 8dfce2f into llamastack:main Apr 28, 2025
37 of 39 checks passed
@bbrowning
Copy link
Collaborator Author

Nice, thanks for the fixes and landing this! I'll work on setting up some nightly testing for this across lots of our providers so we have a baseline to track and expand compatibility with over time.

@bbrowning
Copy link
Collaborator Author

I ran the new OpenAI Responses API verification tests from main immediately after this PR landed, and here's the current status of the other providers I tested:

  • OpenAI provider passed 16/16 (100%) of tests across their gpt-4o and gpt-4o-mini models.
  • Together provider passed 22/22 (100%) of tests across their meta-llama/Llama-3.3-70B-Instruct-Turbo, meta-llama/Llama-4-Scout-17B-16E-Instruct, and meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 models.
  • Groq provider passed 20/22 (91%) of tests across their llama-3.3-70b-versatile, meta-llama/llama-4-scout-17b-16e-instruct, and meta-llama/llama-4-maverick-17b-128e-instruct models.

It will take me a bit longer to get some nightly testing setup for this and to get some manual test results using ollama and remote-vllm providers. But, the changes look good across the hosted providers generally so far.

ashwinb added a commit to llamastack/llama-stack-client-python that referenced this pull request Apr 28, 2025
@ashwinb
Copy link
Contributor

ashwinb commented Apr 29, 2025

The expectation is that custom tools, including ones via MCP, will work with the Responses API.

MCP can work transparently via server side execution. But custom tools need to come back as a tool call to the client, right?

@bbrowning
Copy link
Collaborator Author

The expectation is that custom tools, including ones via MCP, will work with the Responses API.

MCP can work transparently via server side execution. But custom tools need to come back as a tool call to the client, right?

Yes, custom tools will come back to the client as a tool call - ie https://platform.openai.com/docs/guides/function-calling?api-mode=responses . We will have a decision to make (or expose the decision to the person running the Llama Stack server) as to which (if any) MCP tools we invoke automatically via server-side execution and which we send back to the client. In other words, we may choose to allow the person configuring a Stack to extend the list of "builtin" tools that automatically get called from an OpenAI Responses API point-of-view to include some MCP tools.

franciscojavierarceo pushed a commit to franciscojavierarceo/llama-stack that referenced this pull request May 9, 2025
# What does this PR do?

This provides an initial [OpenAI Responses
API](https://platform.openai.com/docs/api-reference/responses)
implementation. The API is not yet complete, and this is more a
proof-of-concept to show how we can store responses in our key-value
stores and use them to support the Responses API concepts like
`previous_response_id`.

## Test Plan

I've added a new
`tests/integration/openai_responses/test_openai_responses.py` as part of
a test-driven development for this new API. I'm only testing this
locally with the remote-vllm provider for now, but it should work with
any of our inference providers since the only API it requires out of the
inference provider is the `openai_chat_completion` endpoint.

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack build --template remote-vllm --image-type venv --run
```

```
LLAMA_STACK_CONFIG="http://localhost:8321" \
python -m pytest -v \
  tests/integration/openai_responses/test_openai_responses.py \
  --text-model "meta-llama/Llama-3.2-3B-Instruct"
 ```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants