[V1] Reject requests that don't fit a warmup shape

See current V1 workarounds here: https://github.com/vllm-project/vllm/pull/14242

Requests must have both a prompt length and a requested number of tokens that is less than or equal to those same settings on a single warmup shape. If a request matches no warmup shape in this way, it must be rejected.

In the V0 implementation, this constraint is checked by the scheduler and the scheduler marks the request as ignored if it matches no warmup shapes. In V1, this currently does not work because the engine logic does not have any logic to handle requests that are immediately rejected.

We could implement this logic in the engine, or we could explore extending the platform api to allow it to validate requests as they are added, instead of at schedule-time. That alternate approach may allow rejecting requests with a 400-type error instead of returning empty results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[V1] Reject requests that don't fit a warmup shape #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[V1] Reject requests that don't fit a warmup shape #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions