[Bug][Benchmark] Fix duplicate req in oversampling #26140

ekagra-ranjan · 2025-10-02T22:13:03Z

There are duplicate requests ids during oversampling in vllm bench serve which causes issue. The server throws these logs

(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145] Error in chat completion stream generator.                                        
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145] Traceback (most recent call last):                                                
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/entrypoints/openai/serving_chat.py", line 574,
 in chat_completion_stream_generator                                                                                                                
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     async for res in result_generator:                                            
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/async_llm.py", line 376, in generate
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     q = await self.add_request(                                                   
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]         ^^^^^^^^^^^^^^^^^^^^^^^                                                   
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/async_llm.py", line 289, in add_requ
est                                                                                                                                                 
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     await self._add_request(request, prompt_str, None, 0, queue)                  
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/async_llm.py", line 315, in _add_req
uest                                                                                                                                                
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     self.output_processor.add_request(request, prompt, parent_req, index,         
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]   File "/host/vllm-ekagra/vllm/vllm/v1/engine/output_processor.py", line 366, in a
dd_request                                                                                                                                          
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145]     raise ValueError(f"Request id {request_id} already running.")                 
(APIServer pid=27557) ERROR 10-02 13:53:06 [serving_chat.py:1145] ValueError: Request id chatcmpl-benchmark-serving146 already running.

This is because the deepcopy on multiple elements selected by random.choices still share the same base address. If the random.choice samples 1 request multiple times then the request id set during the last iteration will be shared by all the instances of the same request. This PR fixes it by deepcopying it one at a time. The PR adds a assertion to confirm that no duplicate req are found. The same assertion fails when the changes in this PR are not made.

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request correctly fixes a bug in the request oversampling logic that caused duplicate request IDs. The original implementation with deepcopy(random.choices(...)) did not create unique objects for requests sampled multiple times. The fix, which involves deep-copying requests individually within a loop, effectively resolves this issue. Additionally, a validation check has been added to detect duplicate request IDs. However, I've found a flaw in this new validation logic that could allow duplicates to go undetected under certain conditions and have suggested a correction.

vllm/benchmarks/datasets.py

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

fix duplicate req in oversampling

90ead0d

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

mergify bot added the performance Performance-related issues label Oct 2, 2025

gemini-code-assist bot reviewed Oct 2, 2025

View reviewed changes

vllm/benchmarks/datasets.py Outdated Show resolved Hide resolved

fix

11d565d

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

ywang96 approved these changes Oct 2, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 2, 2025

Merge branch 'main' into er-fix-duplicate-oversample

142b3c2

ywang96 enabled auto-merge (squash) October 2, 2025 22:26

ywang96 merged commit ad2d788 into vllm-project:main Oct 3, 2025
47 checks passed

sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025

[Bug][Benchmark] Fix duplicate req in oversampling (vllm-project#26140)

825841f

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bug][Benchmark] Fix duplicate req in oversampling (vllm-project#26140)

f4d5cdd

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Bug][Benchmark] Fix duplicate req in oversampling (vllm-project#26140)

2b14fd1

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug][Benchmark] Fix duplicate req in oversampling #26140

[Bug][Benchmark] Fix duplicate req in oversampling #26140

Uh oh!

ekagra-ranjan commented Oct 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bug][Benchmark] Fix duplicate req in oversampling #26140

[Bug][Benchmark] Fix duplicate req in oversampling #26140

Uh oh!

Conversation

ekagra-ranjan commented Oct 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ekagra-ranjan commented Oct 2, 2025 •

edited by github-actions bot

Loading