Skip to content

[Serve.llm] Support OpenAI Responses (Stateful) API #55631

@nrghosh

Description

@nrghosh

Description

The OpenAI Responses API is a new stateful interface that is adds a few features on top of the previous stateless APIs (e.g. /chat/completions, etc).

Guidance from OpenAI is to build new projects using this API instead of chat/completions - but they ensure (for now) indefinite support for chat/completions. They are sunsetting the Assistants API which the new Responses API purportedly replaces sufficiently.

The main benefit of this new endpoint is for simplifying workflows that involve tool use, code execution, and state management. This (similar to the Assistants API) happens server-side - unlike the Chat Completions API where the client has to maintain state and send it back and forth in each request/prompt.

By using "store": true with the new API (and follow-up messages including a "previous_response_id": response_id, conversations become stateful.

Current State

  • Some initial support for OpenAI Responses API has been merged into VLLM
  • Ongoing work on fully supporting Responses on the VLLM side
  • VLLM Discussion Thread about /responses and issues with supporting a stateful API
  • Migration guide from chat/completions -> responses with code examples from OpenAI blogpost
  • Context on Harmony Response Format (OpenAI OSS)
  • Blog on the main differences by Simon Willison

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions