Skip to content

Commit 6dc86aa

Browse files
amitm02ywang96
authored andcommitted
[Core] feat: Implement Priority Scheduling in V1 Engine (vllm-project#19057)
Signed-off-by: amit <amit.man@gmail.com> Co-authored-by: Roger Wang <Rogerw0108@gmail.com> Signed-off-by: fhl <2410591650@qq.com>
1 parent 708948b commit 6dc86aa

File tree

7 files changed

+896
-30
lines changed

7 files changed

+896
-30
lines changed

docs/usage/v1_guide.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,18 @@ For each item, our progress towards V1 support falls into one of the following s
4545
- **🟠 Delayed**: Temporarily dropped in V1 but planned to be re-introduced later.
4646
- **🔴 Deprecated**: Not planned for V1 unless there is strong demand.
4747

48+
!!! note
49+
vLLM V1’s unified scheduler treats both prompt and output tokens the same
50+
way by using a simple dictionary (e.g., `{request_id: num_tokens}`) to dynamically
51+
allocate a fixed token budget per request, enabling features like chunked prefills,
52+
prefix caching, and speculative decoding without a strict separation between prefill
53+
and decode phases.
54+
55+
The V1 scheduler supports multiple scheduling policies, including First-Come,
56+
First-Served (FCFS) and priority-based scheduling (where requests are processed
57+
based on assigned priority, with FCFS as a tie-breaker), configurable via the
58+
`--scheduling-policy` argument.
59+
4860
### Hardware
4961

5062
| Hardware | Status |

0 commit comments

Comments
 (0)