-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Chat API: Force server-generated request_id to avoid collisions; improve clarity and safety #27189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
modify the code to make it more clear and safer
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to improve the Chat API by forcing the use of server-generated request_ids for internal processing to prevent collisions, while still echoing client-provided IDs for backward compatibility. The changes correctly propagate this new internal ID across components. However, there is a critical flaw in the implementation: the internal request_id is not always newly generated if a client provides one, which defeats the primary purpose of this change. I have provided a comment with a suggested fix to ensure a unique, server-generated ID is always used internally, aligning the code with the stated goal of the pull request.
| request_id = self._add_prefix(getattr(request, "request_id", None)) | ||
|
|
||
| request_id = ( | ||
| f"chatcmpl-{self._base_request_id(raw_request, request.request_id)}" | ||
| ) | ||
| if request.kv_transfer_params: | ||
| request_id = self._add_prefix(request.kv_transfer_params.get("p_side_request_id", request_id)) | ||
|
|
||
| default = request.request_id or request_id | ||
| if raw_request is None: | ||
| req_id_head = default | ||
| else: | ||
| req_id_head = raw_request.headers.get("X-Request-ID") | ||
|
|
||
| raw_request_id = self._add_prefix(req_id_head) if req_id_head else request_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for determining the internal request_id does not align with the PR's goal of forcing a server-generated ID to prevent collisions. When a client provides a request_id, it is used as the basis for the internal ID, which can lead to the very collisions this change aims to prevent.
The internal request ID should always be a newly generated, globally unique identifier on the P-side to ensure system stability and clear observability. The client-provided ID should only be used for the response to maintain backward compatibility.
I've suggested a refactoring of this logic to correctly and clearly separate the generation of the internal ID from the determination of the response ID. This change makes the code safer and easier to understand.
# P-side always generates a globally unique internal_request_id.
# D-side receives it from P-side.
if request.kv_transfer_params:
# D-side: use the ID from P-side. If it's not present,
# generate a new one to avoid using a client-provided ID.
internal_request_id = self._add_prefix(
request.kv_transfer_params.get("p_side_request_id"))
else:
# P-side: always generate a new unique ID for internal processing.
internal_request_id = self._add_prefix(None)
# For backward compatibility, the request_id in the response is
# determined by the following order of precedence:
# 1. X-Request-ID header
# 2. client-supplied request.request_id
# 3. server-generated internal_request_id
header_id = raw_request.headers.get("X-Request-ID") if raw_request else None
client_id = getattr(request, "request_id", None)
# The ID to be returned in the response.
response_id = self._add_prefix(header_id or client_id or internal_request_id)
# Use the unique internal ID for all internal processing.
request_id = internal_request_id
# Use the determined response ID for the final response.
raw_request_id = response_idThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| request_id = self._add_prefix(getattr(request, "request_id", None)) | ||
|
|
||
| request_id = ( | ||
| f"chatcmpl-{self._base_request_id(raw_request, request.request_id)}" | ||
| ) | ||
| if request.kv_transfer_params: | ||
| request_id = self._add_prefix(request.kv_transfer_params.get("p_side_request_id", request_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stop using client request_id as internal identifier
The new logic still seeds the internal request_id from request.request_id and from kv_transfer_params["p_side_request_id"], only adding a prefix. When a client provides either of these fields, the value is used verbatim for routing, logging and the engine call, so the server never generates a unique ID of its own. This leaves the system vulnerable to the same request-id collisions the change set is intended to prevent. The internal ID should be generated with random_uuid() regardless of client input and any client-supplied identifier should be kept separate for response echoing only.
Useful? React with 👍 / 👎.
| default = request.request_id or request_id | ||
| if raw_request is None: | ||
| req_id_head = default | ||
| else: | ||
| req_id_head = raw_request.headers.get("X-Request-ID") | ||
|
|
||
| raw_request_id = self._add_prefix(req_id_head) if req_id_head else request_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Response request_id no longer reflects client-provided value
The response ID returned to the user is computed from raw_request.headers['X-Request-ID'] or the internal request_id (which may be random). When a client supplies request_id in the request body but no X-Request-ID header, raw_request_id is set to the internally generated ID, not the client value. This contradicts the stated goal of echoing the client-supplied ID for backward compatibility and may break clients that rely on receiving their original request ID.
Useful? React with 👍 / 👎.
|
+1 on the goal of this PR; the current status of request handling in vllm is quite disruptive as mentioned by the OP. Many benchmark tools (including 'vllm bench' CLI) mark each record by fixed request id, so if two clients run the same benchmark at the same time, they mostly got 500 internal server error, letting vLLM one step farther from a shared model server for AI researchers. |
Thank you. This is an important issue. It is likely also to be the root cause of #26929 for example May I suggest the following:
Does that make sense? |
I forgot ... P does need to supply (via KV transfer params) its internal-request-ID to D ... at least in the case of NIXL, this forms part of the notification message sent from D to P (see |
|
xref #15326 |
Thanks for your suggestion. We will update the PR to make it more general. |
Yes. This is why we pass the internal req-id in P side to D side :) |
Background
What’s changed
Behavioral changes
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.