[V1] Aggregate chunked prompt logprobs in model runner #14875

njhill · 2025-03-15T23:29:31Z

Addresses #14239

Alternative to #14240 avoiding tensor concatenation by pre-allocating CPU tensors and copying GPU chunks into them.

github-actions · 2025-03-15T23:29:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Nick Hill <nhill@redhat.com>

…pt-logprob-chunks

afeldman-nm

Hi @njhill , thanks for this nice concise fix. Made a few suggestions. Remarkably it is does not seem that this PR requires unit test changes, since partial prefill was never unit-tested (woops) ;)

vllm/v1/worker/gpu_model_runner.py

afeldman-nm · 2025-03-19T19:59:59Z

vllm/v1/metrics/stats.py

-            # partially completed prompt.
-            # This will be reverted in a follow up PR and we should re-enable
-            # this assertion / invariant.
+        if is_prefilling:


Might want @markmc 's review on stats, although this all looks good to me.

vllm/v1/core/scheduler.py

vllm/v1/worker/gpu_model_runner.py

mergify · 2025-03-21T02:22:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…unks

Signed-off-by: Nick Hill <nhill@redhat.com>

…unks

vllm/v1/worker/gpu_input_batch.py

vllm/v1/worker/gpu_model_runner.py

robertgshaw2-redhat · 2025-03-21T22:14:05Z

There are two comments:

There is a memory leak if requests are aborted during partial prefills with prompt logprobs
I think we can simplify some special casing (this is nit)

Signed-off-by: Nick Hill <nhill@redhat.com>

robertgshaw2-redhat · 2025-03-24T16:28:03Z

LGTM thanks!

…14875) Signed-off-by: Nick Hill <nhill@redhat.com>

…14875) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Wes Medford <wryanmedford@gmail.com>

…14875) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…14875) Signed-off-by: Nick Hill <nhill@redhat.com>

…14875) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

njhill requested review from WoosukKwon, alexm-redhat, comaniac, robertgshaw2-redhat and ywang96 as code owners March 15, 2025 23:29

mergify bot added the v1 label Mar 15, 2025

[V1] Aggregate prompt logprobs in model runner

0d450f8

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill changed the title ~~[V1] Aggregate prompt logprobs in model runner~~ [V1] Aggregate chunked prompt logprobs in model runner Mar 16, 2025

njhill force-pushed the agg-prompt-logprob-chunks branch from b69760b to 0d450f8 Compare March 16, 2025 00:12

njhill mentioned this pull request Mar 16, 2025

[V1] Aggregate prompt logprobs in EngineCore #14240

Closed

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 16, 2025

njhill added 3 commits March 16, 2025 10:57

fix dtype

59e10c0

Signed-off-by: Nick Hill <nhill@redhat.com>

Fix edge case

3a3deb1

Signed-off-by: Nick Hill <nhill@redhat.com>

Merge remote-tracking branch 'refs/remotes/origin/main' into agg-prom…

34daab3

…pt-logprob-chunks

afeldman-nm suggested changes Mar 19, 2025

View reviewed changes

afeldman-nm approved these changes Mar 20, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 21, 2025

Merge remote-tracking branch 'origin/main' into agg-prompt-logprob-ch…

b9d1873

…unks

mergify bot removed the needs-rebase label Mar 21, 2025

njhill added 2 commits March 21, 2025 08:35

add comment per review

931d7e4

Signed-off-by: Nick Hill <nhill@redhat.com>

Merge remote-tracking branch 'origin/main' into agg-prompt-logprob-ch…

196f1a9

…unks

robertgshaw2-redhat reviewed Mar 21, 2025

View reviewed changes

vllm/v1/worker/gpu_input_batch.py Show resolved Hide resolved

robertgshaw2-redhat reviewed Mar 21, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

njhill added 2 commits March 22, 2025 08:49

address review comments

fec7927

Signed-off-by: Nick Hill <nhill@redhat.com>

move LogprobsTensors construction to static method

acbbfc0

Signed-off-by: Nick Hill <nhill@redhat.com>

robertgshaw2-redhat approved these changes Mar 24, 2025

View reviewed changes

robertgshaw2-redhat merged commit 3aee657 into vllm-project:main Mar 24, 2025
32 of 33 checks passed

njhill deleted the agg-prompt-logprob-chunks branch March 24, 2025 16:41

njhill mentioned this pull request Mar 24, 2025

[Misc]: [V1] prompt logprobs + chunked prefill can result in EngineCore partial prefill output #14239

Closed

1 task

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[V1] Aggregate chunked prompt logprobs in model runner (vllm-project#…

f3c79d4

…14875) Signed-off-by: Nick Hill <nhill@redhat.com>

wrmedford pushed a commit to wrmedford/vllm that referenced this pull request Mar 26, 2025

[V1] Aggregate chunked prompt logprobs in model runner (vllm-project#…

1020254

…14875) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Wes Medford <wryanmedford@gmail.com>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] Aggregate chunked prompt logprobs in model runner (vllm-project#…

a68b528

…14875) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[V1] Aggregate chunked prompt logprobs in model runner (vllm-project#…

7f741c4

…14875) Signed-off-by: Nick Hill <nhill@redhat.com>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1] Aggregate chunked prompt logprobs in model runner (vllm-project#…

78dc690

…14875) Signed-off-by: Nick Hill <nhill@redhat.com>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[V1] Aggregate chunked prompt logprobs in model runner (vllm-project#…

81ddc54

…14875) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1] Aggregate chunked prompt logprobs in model runner #14875

[V1] Aggregate chunked prompt logprobs in model runner #14875

Uh oh!

njhill commented Mar 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 15, 2025

Uh oh!

afeldman-nm left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

afeldman-nm Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 21, 2025

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[V1] Aggregate chunked prompt logprobs in model runner #14875

[V1] Aggregate chunked prompt logprobs in model runner #14875

Uh oh!

Conversation

njhill commented Mar 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 15, 2025

Uh oh!

afeldman-nm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

afeldman-nm Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 21, 2025

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njhill commented Mar 15, 2025 •

edited by github-actions bot

Loading

afeldman-nm left a comment •

edited

Loading