Skip to content

Conversation

@ekagra-ranjan
Copy link
Contributor

@ekagra-ranjan ekagra-ranjan commented Oct 2, 2025

There are duplicate requests ids during oversampling in vllm bench serve which causes issue. The server throws these logs

(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145] Error in chat completion stream generator.                                        
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145] Traceback (most recent call last):                                                
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/entrypoints/openai/serving_chat.py", line 574,
 in chat_completion_stream_generator                                                                                                                
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     async for res in result_generator:                                            
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/async_llm.py", line 376, in generate
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     q = await self.add_request(                                                   
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]         ^^^^^^^^^^^^^^^^^^^^^^^                                                   
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/async_llm.py", line 289, in add_requ
est                                                                                                                                                 
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     await self._add_request(request, prompt_str, None, 0, queue)                  
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/async_llm.py", line 315, in _add_req
uest                                                                                                                                                
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     self.output_processor.add_request(request, prompt, parent_req, index,         
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/output_processor.py", line 366, in a
dd_request                                                                                                                                          
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     raise ValueError(f"Request id {request_id} already running.")                 
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145] ValueError: Request id chatcmpl-benchmark-serving146 already running.

This is because the deepcopy on multiple elements selected by random.choices still share the same base address. If the random.choice samples 1 request multiple times then the request id set during the last iteration will be shared by all the instances of the same request. This PR fixes it by deepcopying it one at a time. The PR adds a assertion to confirm that no duplicate req are found. The same assertion fails when the changes in this PR are not made.

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
@mergify mergify bot added the performance Performance-related issues label Oct 2, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the request oversampling logic that caused duplicate request IDs. The original implementation with deepcopy(random.choices(...)) did not create unique objects for requests sampled multiple times. The fix, which involves deep-copying requests individually within a loop, effectively resolves this issue. Additionally, a validation check has been added to detect duplicate request IDs. However, I've found a flaw in this new validation logic that could allow duplicates to go undetected under certain conditions and have suggested a correction.

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 2, 2025
@ywang96 ywang96 enabled auto-merge (squash) October 2, 2025 22:26
@ywang96 ywang96 merged commit ad2d788 into vllm-project:main Oct 3, 2025
47 checks passed
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
karan pushed a commit to karan/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants