Chat API: Force server-generated request_id to avoid collisions; improve clarity and safety #27189

Zaragoto · 2025-10-20T08:51:28Z

Background

Problem: The Chat API could previously accept and reuse a client-provided request_id. Under duplicated IDs, this may lead to collisions, state leaks, mismatched responses, or confusing observability.
Related issue: [Bug]: Duplicate request_id breaks the engine #10583

What’s changed

Internal/server-generated ID
- The P-side always generates a globally unique internal_request_id (e.g., UUID/ULID) for all internal processing (routing, queue keys, logs/traces).
Response behavior
- If the client supplies request_id, the server still uses the internal_request_id internally but echoes the client-supplied request_id in the API response for backward compatibility.
- If the client omits request_id, the server returns the internal_request_id in the response.
Cross-component propagation (P→D)
- The P-side passes the internal_request_id to the D-side via kv_transfer_params.
- The D-side uses this propagated internal_request_id for its processing and logging, ensuring the same internal ID is used consistently across both P and D.

Behavioral changes

Before: Client request_id might be used internally and echoed back, risking collisions across components.
Now:
- Internally: Always use a server-generated internal_request_id. The same internal_request_id is propagated P→D via kv_transfer_params.
- Externally (response):
  - If client supplied request_id, echo that value in the response.
  - Otherwise, return the server-generated internal_request_id.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

modify the code to make it more clear and safer

github-actions · 2025-10-20T08:51:37Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2025-10-20T08:52:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Zaragoto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request aims to improve the Chat API by forcing the use of server-generated request_ids for internal processing to prevent collisions, while still echoing client-provided IDs for backward compatibility. The changes correctly propagate this new internal ID across components. However, there is a critical flaw in the implementation: the internal request_id is not always newly generated if a client provides one, which defeats the primary purpose of this change. I have provided a comment with a suggested fix to ensure a unique, server-generated ID is always used internally, aligning the code with the stated goal of the pull request.

gemini-code-assist · 2025-10-20T08:52:52Z

vllm/entrypoints/openai/serving_chat.py

+        request_id = self._add_prefix(getattr(request, "request_id", None))

-        request_id = (
-            f"chatcmpl-{self._base_request_id(raw_request, request.request_id)}"
-        )
+        if request.kv_transfer_params:
+            request_id = self._add_prefix(request.kv_transfer_params.get("p_side_request_id", request_id))
+
+        default = request.request_id or request_id
+        if raw_request is None:
+            req_id_head = default
+        else:
+            req_id_head = raw_request.headers.get("X-Request-ID")
+
+        raw_request_id = self._add_prefix(req_id_head) if req_id_head else request_id


The current logic for determining the internal request_id does not align with the PR's goal of forcing a server-generated ID to prevent collisions. When a client provides a request_id, it is used as the basis for the internal ID, which can lead to the very collisions this change aims to prevent.

The internal request ID should always be a newly generated, globally unique identifier on the P-side to ensure system stability and clear observability. The client-provided ID should only be used for the response to maintain backward compatibility.

I've suggested a refactoring of this logic to correctly and clearly separate the generation of the internal ID from the determination of the response ID. This change makes the code safer and easier to understand.

# P-side always generates a globally unique internal_request_id. # D-side receives it from P-side. if request.kv_transfer_params: # D-side: use the ID from P-side. If it's not present, # generate a new one to avoid using a client-provided ID. internal_request_id = self._add_prefix( request.kv_transfer_params.get("p_side_request_id")) else: # P-side: always generate a new unique ID for internal processing. internal_request_id = self._add_prefix(None) # For backward compatibility, the request_id in the response is # determined by the following order of precedence: # 1. X-Request-ID header # 2. client-supplied request.request_id # 3. server-generated internal_request_id header_id = raw_request.headers.get("X-Request-ID") if raw_request else None client_id = getattr(request, "request_id", None) # The ID to be returned in the response. response_id = self._add_prefix(header_id or client_id or internal_request_id) # Use the unique internal ID for all internal processing. request_id = internal_request_id # Use the determined response ID for the final response. raw_request_id = response_id

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-10-20T08:55:07Z

vllm/entrypoints/openai/serving_chat.py

+        request_id = self._add_prefix(getattr(request, "request_id", None))

-        request_id = (
-            f"chatcmpl-{self._base_request_id(raw_request, request.request_id)}"
-        )
+        if request.kv_transfer_params:
+            request_id = self._add_prefix(request.kv_transfer_params.get("p_side_request_id", request_id))


Stop using client request_id as internal identifier

The new logic still seeds the internal request_id from request.request_id and from kv_transfer_params["p_side_request_id"], only adding a prefix. When a client provides either of these fields, the value is used verbatim for routing, logging and the engine call, so the server never generates a unique ID of its own. This leaves the system vulnerable to the same request-id collisions the change set is intended to prevent. The internal ID should be generated with random_uuid() regardless of client input and any client-supplied identifier should be kept separate for response echoing only.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-20T08:55:07Z

vllm/entrypoints/openai/serving_chat.py

+        default = request.request_id or request_id
+        if raw_request is None:
+            req_id_head = default
+        else:
+            req_id_head = raw_request.headers.get("X-Request-ID")
+
+        raw_request_id = self._add_prefix(req_id_head) if req_id_head else request_id


Response request_id no longer reflects client-provided value

The response ID returned to the user is computed from raw_request.headers['X-Request-ID'] or the internal request_id (which may be random). When a client supplies request_id in the request body but no X-Request-ID header, raw_request_id is set to the internally generated ID, not the client value. This contradicts the stated goal of echoing the client-supplied ID for backward compatibility and may break clients that rely on receiving their original request ID.

Useful? React with 👍 / 👎.

cjackal · 2025-10-20T12:18:47Z

+1 on the goal of this PR; the current status of request handling in vllm is quite disruptive as mentioned by the OP. Many benchmark tools (including 'vllm bench' CLI) mark each record by fixed request id, so if two clients run the same benchmark at the same time, they mostly got 500 internal server error, letting vLLM one step farther from a shared model server for AI researchers.

markmc · 2025-10-20T12:26:20Z

Background

Problem: The Chat API could previously accept and reuse a client-provided request_id. Under duplicated IDs, this may lead to collisions, state leaks, mismatched responses, or confusing observability.

Thank you. This is an important issue. It is likely also to be the root cause of #26929 for example

May I suggest the following:

Our solution needs to be comprehensive, beyond the chat API - consider that any API call that results in an engine.generate() or engine.encode() call needs to handle the request ID in this way, but even other components like the input processor may key off the request ID too. See all the users of _base_request_id()
That we should only key off "a globally unique internal_request_id" (e.g., UUID/ULID) is a good principle
The primary use case (AIUI) for supporting a user-supplied request ID is logging (and maybe tracing?) - it enables correlation of logs where multiple instances are involved with processing a single request
For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D
And so, ITSM we will need a RequestID that includes both the user-supplied ID and the internally-generated unique ID, and the code will need to be updated so that we log the former and key off the latter. That's probably going to be an invasive change.

Does that make sense?

markmc · 2025-10-20T12:33:38Z

For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D

I forgot ... P does need to supply (via KV transfer params) its internal-request-ID to D ... at least in the case of NIXL, this forms part of the notification message sent from D to P (see _read_blocks())

markmc · 2025-10-22T11:45:14Z

xref #15326

Zaragoto · 2025-10-29T15:14:00Z

Background

Problem: The Chat API could previously accept and reuse a client-provided request_id. Under duplicated IDs, this may lead to collisions, state leaks, mismatched responses, or confusing observability.

Thank you. This is an important issue. It is likely also to be the root cause of #26929 for example

May I suggest the following:

Our solution needs to be comprehensive, beyond the chat API - consider that any API call that results in an engine.generate() or engine.encode() call needs to handle the request ID in this way, but even other components like the input processor may key off the request ID too. See all the users of _base_request_id()

That we should only key off "a globally unique internal_request_id" (e.g., UUID/ULID) is a good principle

The primary use case (AIUI) for supporting a user-supplied request ID is logging (and maybe tracing?) - it enables correlation of logs where multiple instances are involved with processing a single request

For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D

And so, ITSM we will need a RequestID that includes both the user-supplied ID and the internally-generated unique ID, and the code will need to be updated so that we log the former and key off the latter. That's probably going to be an invasive change.

Does that make sense?

Thanks for your suggestion. We will update the PR to make it more general.

Zaragoto · 2025-10-29T15:17:53Z

For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D

I forgot ... P does need to supply (via KV transfer params) its internal-request-ID to D ... at least in the case of NIXL, this forms part of the notification message sent from D to P (see _read_blocks())

Yes. This is why we pass the internal req-id in P side to D side :)

Yao Yunxiang added 3 commits October 15, 2025 14:23

force vllm to rewrite request id in chat api, to avoid collisions

18436d1

modify the code to make it more clear and safer

fix the code logic

8e5fb77

fix the code logic

592a029

Zaragoto requested review from aarnphm and chaunceyjiang as code owners October 20, 2025 08:51

mergify bot added the frontend label Oct 20, 2025

mergify bot added the needs-rebase label Oct 20, 2025

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

njhill self-requested a review October 22, 2025 15:19

markmc mentioned this pull request Oct 29, 2025

[benchmark] Make request IDs unique across clients by default #27723

Merged

5 tasks

Uh oh!

Chat API: Force server-generated request_id to avoid collisions; improve clarity and safety #27189

Are you sure you want to change the base?

Chat API: Force server-generated request_id to avoid collisions; improve clarity and safety #27189

Conversation

Zaragoto commented Oct 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

mergify bot commented Oct 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal commented Oct 20, 2025

Uh oh!

markmc commented Oct 20, 2025

Uh oh!

markmc commented Oct 20, 2025

Uh oh!

markmc commented Oct 22, 2025

Uh oh!

Zaragoto commented Oct 29, 2025

Uh oh!

Zaragoto commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Zaragoto commented Oct 20, 2025 •

edited by github-actions bot

Loading