[main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. #2062

yx0716 · 2025-07-28T07:21:22Z

Before refactoring cross-DP decoding metadata aggregation, clean up the token‐padding logic .

What this PR does：

First checks whether any DP instance is in the prefill phase.
If in the decode phase and torchair_graph_enabled is true, pads each DP instance’s token count up to the global maximum.
If in the prefill phase, or in decode phase with graph mode disabled, returns each DP instance’s original token count without padding.

This reordering removes the previous two‐step padding/unpadding flow and ensures padding only occurs when strictly necessary.

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@bd3db7f

codecov · 2025-07-30T13:38:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.34%. Comparing base (807f089) to head (b269550).
⚠️ Report is 613 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2062   +/-   ##
=======================================
  Coverage   76.34%   76.34%           
=======================================
  Files         110      110           
  Lines       12473    12473           
=======================================
  Hits         9522     9522           
  Misses       2951     2951

Flag	Coverage Δ
unittests	`76.34% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ApsarasX · 2025-07-31T15:29:48Z

I think the third scenario should be considered, in the decode phase and torchair_graph_enabled is true, but with_prefill is true

github-actions · 2025-08-01T01:20:21Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…s to reduce redundant padding. Signed-off-by: yx0716 <jinyx1007@foxmail.com> Signed-off-by: MengqingCao <cmq0113@163.com>

yx0716 · 2025-08-05T06:18:30Z

I think the third scenario should be considered, in the decode phase and torchair_graph_enabled is true, but with_prefill is true

In a single DP instance, TorchAir’s graph mode and chunk-prefill mode are not enabled at the same time, thus prefill and decode will not exist simultaneously.
Across multiple DP instances, if even one instance enters the prefill stage, all instances automatically revert to non-graph mode.

Given these two scenarios, it seems unlikely that a third case would arise at this point.

wangxiyuan · 2025-08-05T08:58:28Z

@ApsarasX I'm working on torchair model runner refacor, let's do more work there #2204

ApsarasX · 2025-08-05T09:02:07Z

@ApsarasX I'm working on torchair model runner refacor, let's do more work there #2204

OK

…s to reduce redundant padding. (vllm-project#2062) Before refactoring cross-DP decoding metadata aggregation, clean up the token‐padding logic . ### What this PR does： 1. First checks whether any DP instance is in the prefill phase. 2. If in the `decode` phase and `torchair_graph_enabled `is true, pads each DP instance’s token count up to the global maximum. 3. If in the `prefill` phase, or in decode phase with graph mode **disabled**, returns each DP instance’s original token count without padding. This reordering removes the previous two‐step padding/unpadding flow and ensures padding only occurs when strictly necessary. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@bd3db7f Signed-off-by: yx0716 <jinyx1007@foxmail.com> Signed-off-by: MengqingCao <cmq0113@163.com>

yx0716 force-pushed the main branch 2 times, most recently from c4f2fcc to f3176d0 Compare July 28, 2025 12:00

yx0716 changed the title ~~[Misc] Refractor forward metadata retrieval across DP nodes to reduce redundant padding.~~ [main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. Jul 28, 2025

yx0716 force-pushed the main branch from f3176d0 to 73e7a33 Compare July 29, 2025 03:12

yx0716 marked this pull request as ready for review July 29, 2025 03:57

yx0716 force-pushed the main branch 5 times, most recently from c7a0720 to d58e96a Compare July 30, 2025 13:10

yx0716 force-pushed the main branch 2 times, most recently from ca23f38 to 81cd9c5 Compare July 31, 2025 01:33

github-actions bot added the merge-conflicts label Aug 1, 2025

yx0716 force-pushed the main branch 2 times, most recently from 5797eae to 7b133f6 Compare August 1, 2025 06:30

github-actions bot removed the merge-conflicts label Aug 1, 2025

yx0716 force-pushed the main branch 2 times, most recently from 31b13af to 9ef1fb9 Compare August 5, 2025 02:31

[Main][Refractor] Refractor forward metadata retrieval across DP node…

b269550

…s to reduce redundant padding. Signed-off-by: yx0716 <jinyx1007@foxmail.com> Signed-off-by: MengqingCao <cmq0113@163.com>

yx0716 force-pushed the main branch from 9ef1fb9 to b269550 Compare August 5, 2025 02:53

wangxiyuan approved these changes Aug 5, 2025

View reviewed changes

wangxiyuan mentioned this pull request Aug 5, 2025

[V1] MTP supports torchair #2145

Merged

7 tasks

ApsarasX approved these changes Aug 5, 2025

View reviewed changes

wangxiyuan merged commit 583ad8f into vllm-project:main Aug 5, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. #2062

[main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. #2062

Uh oh!

yx0716 commented Jul 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

codecov bot commented Jul 30, 2025 •

edited

Loading

Uh oh!

ApsarasX commented Jul 31, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

yx0716 commented Aug 5, 2025

Uh oh!

wangxiyuan commented Aug 5, 2025

Uh oh!

ApsarasX commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. #2062

[main][refractor] Refractor forward metadata retrieval across DP nodes to reduce redundant padding. #2062

Uh oh!

Conversation

yx0716 commented Jul 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does：

Uh oh!

codecov bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ApsarasX commented Jul 31, 2025

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

yx0716 commented Aug 5, 2025

Uh oh!

wangxiyuan commented Aug 5, 2025

Uh oh!

ApsarasX commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yx0716 commented Jul 28, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jul 30, 2025 •

edited

Loading