Skip to content

[V1] Reject requests that don't fit a warmup shape #16

@joerunde

Description

@joerunde

See current V1 workarounds here: vllm-project/vllm#14242

Requests must have both a prompt length and a requested number of tokens that is less than or equal to those same settings on a single warmup shape. If a request matches no warmup shape in this way, it must be rejected.

In the V0 implementation, this constraint is checked by the scheduler and the scheduler marks the request as ignored if it matches no warmup shapes. In V1, this currently does not work because the engine logic does not have any logic to handle requests that are immediately rejected.

We could implement this logic in the engine, or we could explore extending the platform api to allow it to validate requests as they are added, instead of at schedule-time. That alternate approach may allow rejecting requests with a 400-type error instead of returning empty results

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions