Skip to content

Conversation

@Zaragoto
Copy link

@Zaragoto Zaragoto commented Oct 20, 2025

Background

  • Problem: The Chat API could previously accept and reuse a client-provided request_id. Under duplicated IDs, this may lead to collisions, state leaks, mismatched responses, or confusing observability.
  • Related issue: [Bug]: Duplicate request_id breaks the engine #10583

What’s changed

  • Internal/server-generated ID
    • The P-side always generates a globally unique internal_request_id (e.g., UUID/ULID) for all internal processing (routing, queue keys, logs/traces).
  • Response behavior
    • If the client supplies request_id, the server still uses the internal_request_id internally but echoes the client-supplied request_id in the API response for backward compatibility.
    • If the client omits request_id, the server returns the internal_request_id in the response.
  • Cross-component propagation (P→D)
    • The P-side passes the internal_request_id to the D-side via kv_transfer_params.
    • The D-side uses this propagated internal_request_id for its processing and logging, ensuring the same internal ID is used consistently across both P and D.

Behavioral changes

  • Before: Client request_id might be used internally and echoed back, risking collisions across components.
  • Now:
    • Internally: Always use a server-generated internal_request_id. The same internal_request_id is propagated P→D via kv_transfer_params.
    • Externally (response):
      • If client supplied request_id, echo that value in the response.
      • Otherwise, return the server-generated internal_request_id.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the frontend label Oct 20, 2025
@mergify
Copy link

mergify bot commented Oct 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Zaragoto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 20, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to improve the Chat API by forcing the use of server-generated request_ids for internal processing to prevent collisions, while still echoing client-provided IDs for backward compatibility. The changes correctly propagate this new internal ID across components. However, there is a critical flaw in the implementation: the internal request_id is not always newly generated if a client provides one, which defeats the primary purpose of this change. I have provided a comment with a suggested fix to ensure a unique, server-generated ID is always used internally, aligning the code with the stated goal of the pull request.

Comment on lines +259 to +270
request_id = self._add_prefix(getattr(request, "request_id", None))

request_id = (
f"chatcmpl-{self._base_request_id(raw_request, request.request_id)}"
)
if request.kv_transfer_params:
request_id = self._add_prefix(request.kv_transfer_params.get("p_side_request_id", request_id))

default = request.request_id or request_id
if raw_request is None:
req_id_head = default
else:
req_id_head = raw_request.headers.get("X-Request-ID")

raw_request_id = self._add_prefix(req_id_head) if req_id_head else request_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current logic for determining the internal request_id does not align with the PR's goal of forcing a server-generated ID to prevent collisions. When a client provides a request_id, it is used as the basis for the internal ID, which can lead to the very collisions this change aims to prevent.

The internal request ID should always be a newly generated, globally unique identifier on the P-side to ensure system stability and clear observability. The client-provided ID should only be used for the response to maintain backward compatibility.

I've suggested a refactoring of this logic to correctly and clearly separate the generation of the internal ID from the determination of the response ID. This change makes the code safer and easier to understand.

        # P-side always generates a globally unique internal_request_id.
        # D-side receives it from P-side.
        if request.kv_transfer_params:
            # D-side: use the ID from P-side. If it's not present,
            # generate a new one to avoid using a client-provided ID.
            internal_request_id = self._add_prefix(
                request.kv_transfer_params.get("p_side_request_id"))
        else:
            # P-side: always generate a new unique ID for internal processing.
            internal_request_id = self._add_prefix(None)

        # For backward compatibility, the request_id in the response is
        # determined by the following order of precedence:
        # 1. X-Request-ID header
        # 2. client-supplied request.request_id
        # 3. server-generated internal_request_id
        header_id = raw_request.headers.get("X-Request-ID") if raw_request else None
        client_id = getattr(request, "request_id", None)

        # The ID to be returned in the response.
        response_id = self._add_prefix(header_id or client_id or internal_request_id)

        # Use the unique internal ID for all internal processing.
        request_id = internal_request_id
        # Use the determined response ID for the final response.
        raw_request_id = response_id

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +259 to +262
request_id = self._add_prefix(getattr(request, "request_id", None))

request_id = (
f"chatcmpl-{self._base_request_id(raw_request, request.request_id)}"
)
if request.kv_transfer_params:
request_id = self._add_prefix(request.kv_transfer_params.get("p_side_request_id", request_id))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop using client request_id as internal identifier

The new logic still seeds the internal request_id from request.request_id and from kv_transfer_params["p_side_request_id"], only adding a prefix. When a client provides either of these fields, the value is used verbatim for routing, logging and the engine call, so the server never generates a unique ID of its own. This leaves the system vulnerable to the same request-id collisions the change set is intended to prevent. The internal ID should be generated with random_uuid() regardless of client input and any client-supplied identifier should be kept separate for response echoing only.

Useful? React with 👍 / 👎.

Comment on lines +264 to +270
default = request.request_id or request_id
if raw_request is None:
req_id_head = default
else:
req_id_head = raw_request.headers.get("X-Request-ID")

raw_request_id = self._add_prefix(req_id_head) if req_id_head else request_id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Response request_id no longer reflects client-provided value

The response ID returned to the user is computed from raw_request.headers['X-Request-ID'] or the internal request_id (which may be random). When a client supplies request_id in the request body but no X-Request-ID header, raw_request_id is set to the internally generated ID, not the client value. This contradicts the stated goal of echoing the client-supplied ID for backward compatibility and may break clients that rely on receiving their original request ID.

Useful? React with 👍 / 👎.

@cjackal
Copy link
Contributor

cjackal commented Oct 20, 2025

+1 on the goal of this PR; the current status of request handling in vllm is quite disruptive as mentioned by the OP. Many benchmark tools (including 'vllm bench' CLI) mark each record by fixed request id, so if two clients run the same benchmark at the same time, they mostly got 500 internal server error, letting vLLM one step farther from a shared model server for AI researchers.

@markmc
Copy link
Member

markmc commented Oct 20, 2025

Background

  • Problem: The Chat API could previously accept and reuse a client-provided request_id. Under duplicated IDs, this may lead to collisions, state leaks, mismatched responses, or confusing observability.

Thank you. This is an important issue. It is likely also to be the root cause of #26929 for example

May I suggest the following:

  • Our solution needs to be comprehensive, beyond the chat API - consider that any API call that results in an engine.generate() or engine.encode() call needs to handle the request ID in this way, but even other components like the input processor may key off the request ID too. See all the users of _base_request_id()

  • That we should only key off "a globally unique internal_request_id" (e.g., UUID/ULID) is a good principle

  • The primary use case (AIUI) for supporting a user-supplied request ID is logging (and maybe tracing?) - it enables correlation of logs where multiple instances are involved with processing a single request

  • For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D

  • And so, ITSM we will need a RequestID that includes both the user-supplied ID and the internally-generated unique ID, and the code will need to be updated so that we log the former and key off the latter. That's probably going to be an invasive change.

Does that make sense?

@markmc
Copy link
Member

markmc commented Oct 20, 2025

  • For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D

I forgot ... P does need to supply (via KV transfer params) its internal-request-ID to D ... at least in the case of NIXL, this forms part of the notification message sent from D to P (see _read_blocks())

@markmc
Copy link
Member

markmc commented Oct 22, 2025

xref #15326

@Zaragoto
Copy link
Author

Background

  • Problem: The Chat API could previously accept and reuse a client-provided request_id. Under duplicated IDs, this may lead to collisions, state leaks, mismatched responses, or confusing observability.

Thank you. This is an important issue. It is likely also to be the root cause of #26929 for example

May I suggest the following:

  • Our solution needs to be comprehensive, beyond the chat API - consider that any API call that results in an engine.generate() or engine.encode() call needs to handle the request ID in this way, but even other components like the input processor may key off the request ID too. See all the users of _base_request_id()
  • That we should only key off "a globally unique internal_request_id" (e.g., UUID/ULID) is a good principle
  • The primary use case (AIUI) for supporting a user-supplied request ID is logging (and maybe tracing?) - it enables correlation of logs where multiple instances are involved with processing a single request
  • For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D
  • And so, ITSM we will need a RequestID that includes both the user-supplied ID and the internally-generated unique ID, and the code will need to be updated so that we log the former and key off the latter. That's probably going to be an invasive change.

Does that make sense?

Thanks for your suggestion. We will update the PR to make it more general.

@Zaragoto
Copy link
Author

  • For P/D we should assume an external proxy generates a request ID and supplies it to both the P and D instances. We don't need to support passing the P-generated ID to D

I forgot ... P does need to supply (via KV transfer params) its internal-request-ID to D ... at least in the case of NIXL, this forms part of the notification message sent from D to P (see _read_blocks())

Yes. This is why we pass the internal req-id in P side to D side :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants