[Hardware] add platform-specific request validation api #16291

joerunde · 2025-04-08T22:04:36Z

This PR adds a generic validate_request api to the platform interface. This allows platforms to implement any runtime checks on each request to ensure that all the requested features are supported before scheduling it. There is already one existing check in this category, supports_structured_output, and I'd like to avoid a proliferation of more platform apis for individual features like this.

Currently, the spyre plugin needs to implement some extra validation around the shape of inputs, as we have more constraints on valid prompt lengths and max token requests. This new api would let us do that without needing to hack around rejecting requests from the scheduler.

FIX vllm-project/vllm-spyre#77

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

github-actions · 2025-04-08T22:04:48Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill · 2025-04-08T22:10:21Z

cc @NickLucche this is was we were discussing a couple of days ago...

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

njhill

Thanks @joerunde!

njhill · 2025-04-08T22:14:29Z

vllm/v1/engine/processor.py

        # TODO(woosuk): Support encoder-decoder models.

+        from vllm.platforms import current_platform
+        current_platform.validate_request(


Wondering whether we should remove the call to supports_structured_output and have the default impl of validate_request call that instead. Actually maybe we could remove the supports_structured_output interface method and have validate_request only call it if it exists in the same class?

Sure. Maybe the simplest thing to do is to just add an impl for the TPU backend and have it reject structured output requests?

@njhill I went with the 🔥🔥🔥 option- WDYT?
The only difference in behavior now should be that all out-of-tree platforms will need to explicitly reject structured output in validate_request instead of inheriting the default impl of supports_structured_output

njhill · 2025-04-08T22:15:28Z

vllm/platforms/interface.py

+        cls,
+        prompt: PromptType,
+        params: Union[SamplingParams, PoolingParams],
+        lora_request: Optional[LoRARequest] = None,


Do we need to include lora_request here? Wouldn't a platform either support lora or not, and if not isn't this something that could be checked at startup time?

Ah, yeah that's true. I was thinking there might be a case where something about the adapter needs validation but I think you're right that anything about supporting lora would be checked either at boot time, or at adapter load time

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

njhill

Thanks @joerunde looks great ... would be good for @NickLucche to take a look too!

NickLucche

fine work @joerunde thanks!
I already have another check for TPU here #16172 so the structured output exception will feel less lonely.

joerunde · 2025-04-09T14:47:58Z

the structured output exception will feel less lonely

Ah nice, everybody needs friends!

yarongmu-google · 2025-04-09T21:32:00Z

This PR has broken the benchmark_serving.py command; can we please rollback, or fix?

Traceback (most recent call last):
File "/workspace/vllm/benchmarks/benchmark_serving.py", line 1083, in
main(args)
File "/workspace/vllm/benchmarks/benchmark_serving.py", line 684, in main
benchmark_result = asyncio.run(
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/workspace/vllm/benchmarks/benchmark_serving.py", line 297, in benchmark
raise ValueError(
ValueError: Initial test run failed - Please make sure benchmark arguments are correctly specified. Error: Never received a valid chunk to calculate TTFT.This response will be marked as failed!

Repro:
(this) cb391d8-> failed
(one before) fee5b8d-> good

mgoin · 2025-04-09T21:42:14Z

@yarongmu-google could you share the benchmark_serving.py command that failed? I tried a simple command and it worked

vllm serve meta-llama/Llama-3.1-8B-Instruct --port 9000

python benchmarks/benchmark_serving.py --backend vllm --model meta-llama/Llama-3.1-8B-Instruct --num-prompts 1 --dataset-name random --random-input 1024 --random-output 512 --port 9000
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  6.83      
Total input tokens:                      1024      
Total generated tokens:                  512       
Request throughput (req/s):              0.15      
Output token throughput (tok/s):         74.92     
Total Token throughput (tok/s):          224.77    
---------------Time to First Token----------------
Mean TTFT (ms):                          21.53     
Median TTFT (ms):                        21.53     
P99 TTFT (ms):                           21.53     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          13.33     
Median TPOT (ms):                        13.33     
P99 TPOT (ms):                           13.33     
---------------Inter-token Latency----------------
Mean ITL (ms):                           13.33     
Median ITL (ms):                         10.36     
P99 ITL (ms):                            29.87     
==================================================

yarongmu-google · 2025-04-09T22:12:44Z

@mgoin

python benchmarks/benchmark_serving.py
--backend vllm
--model $MODEL
--dataset-name sonnet
--dataset-path benchmarks/sonnet_4x.txt
--sonnet-input-len 1800
--sonnet-output-len 128
--ignore-eos

where MODEL is llama3 70B. Note that this is run on a clean machine created only for perf benchamrks.

@yaochengji also saw the breakage. Chengji what's your command?

yarongmu-google · 2025-04-09T22:15:28Z

Note that the breakage is on TPU

yaochengji · 2025-04-09T22:31:09Z

@yaochengji also saw the breakage. Chengji what's your command?

I only saw this breakage in CI test, my benchmarking command on llama-8B model is good.

yarongmu-google · 2025-04-09T22:35:47Z

Hmmm .. maybe it's fixed somehow later?? Let's give it a bit more time. Sorry for flooding this PR :)

yaochengji · 2025-04-09T22:49:46Z

maybe it's fixed somehow later

I don't think so. It's the latest commit at the moment.

mgoin · 2025-04-10T00:42:27Z

I have posted a fix here, it is specific to TPU V1 #16369

joerunde · 2025-04-14T20:00:14Z

Shoot, sorry for the breakage!

I had wrongly assumed that TPU tests would catch that case during the CI for this PR, before merging to main :(

…#16291) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Yang Wang <elainewy@meta.com>

…#16291) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

…#16291) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

✨ add platform-specific request validation

3bda5d1

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

joerunde requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners April 8, 2025 22:04

mergify bot added the v1 label Apr 8, 2025

🐛 fix type hints for docs build

d20b602

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

njhill reviewed Apr 8, 2025

View reviewed changes

joerunde added 2 commits April 8, 2025 16:40

✨ implement tpu request validation

78a81d9

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

♻️ remove supports_structured_output

6ba2638

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

mergify bot added the tpu Related to Google TPUs label Apr 8, 2025

njhill approved these changes Apr 9, 2025

View reviewed changes

NickLucche approved these changes Apr 9, 2025

View reviewed changes

joerunde added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 9, 2025

mgoin mentioned this pull request Apr 9, 2025

[TPU][V1] Disable per-request seed/Generator #16172

Merged

njhill merged commit cb391d8 into vllm-project:main Apr 9, 2025
57 checks passed

joerunde deleted the platform-request-validation branch April 9, 2025 19:53

mgoin mentioned this pull request Apr 10, 2025

[Bugfix][TPU] Fix TPU validate_request #16369

Merged

joerunde mentioned this pull request Apr 15, 2025

[Hardware] Add processor inputs to platform validation #16680

Merged

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025

[Hardware] add platform-specific request validation api (vllm-project…

149389c

…#16291) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Yang Wang <elainewy@meta.com>

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[Hardware] add platform-specific request validation api (vllm-project…

04a08ad

…#16291) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Hardware] add platform-specific request validation api (vllm-project…

6131353

…#16291) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Hardware] add platform-specific request validation api (vllm-project…

6785a71

…#16291) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Uh oh!

[Hardware] add platform-specific request validation api #16291

[Hardware] add platform-specific request validation api #16291

Uh oh!

Conversation

joerunde commented Apr 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 8, 2025

Uh oh!

njhill commented Apr 8, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde commented Apr 9, 2025

Uh oh!

Uh oh!

yarongmu-google commented Apr 9, 2025

Uh oh!

mgoin commented Apr 9, 2025

Uh oh!

yarongmu-google commented Apr 9, 2025

Uh oh!

yarongmu-google commented Apr 9, 2025

Uh oh!

yaochengji commented Apr 9, 2025

Uh oh!

yarongmu-google commented Apr 9, 2025

Uh oh!

yaochengji commented Apr 9, 2025

Uh oh!

mgoin commented Apr 10, 2025

Uh oh!

joerunde commented Apr 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

joerunde commented Apr 8, 2025 •

edited by github-actions bot

Loading