Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: Replaceable Scheduler #7123

Open
NadavShmayo opened this issue Aug 4, 2024 · 2 comments
Open

[RFC]: Replaceable Scheduler #7123

NadavShmayo opened this issue Aug 4, 2024 · 2 comments
Labels

Comments

@NadavShmayo
Copy link
Contributor

NadavShmayo commented Aug 4, 2024

Motivation.

The default scheduler is functioning well for the basic use case of serving with maximum throughput.
There are still some use cases in which we prioritize other metrics before maximum throughput, for example maintaining fairness between different users.

I specifically have a use case in which I have an application that uses vLLM, and tries to maintain fairness between requests of different users of the application.
By making the scheduler component more abstract and replaceable (perhaps also pluginable) we can allow such use case without having to change the scheduler logic to support each of these use cases.

Proposed Change.

I propose 2 different solutions, one of which may be hard to implement, but allows anyone to implement any scheduling logic they wish without changing any other core logic. The other is simple to implement but doesn't allow full control of the scheduler logic, and the other may be harder to implement but .

Solution 1 - Scheduler plugins

This solution requires defining an abstract base class of a scheduler, and allowing to pass the desired scheduler implementation file path as a CLI argument (or an environment variable).
This idea could also serve as the basis of scheduler plugins - meaning anyone could implement their own scheduler as a package separate from core vLLM, which allows for great extensibility and modularity.

Solution 2 - Support voluntary preemption hooks

This solution is less flexible but should still allow support for most scheduling logic.
This solution means that the Scheduler class should expose public methods for preempt/suspend and resume a SequenceGroup, and then the API can add routes to expose these methods.
This way we allow applications wrapping vLLM to implement their own complex scheduling logic, to give each user it's fair share of scheduling, or any other desired scheduling logic.

Feedback Period.

No response

CC List.

No response

Any Other Things.

Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.

@njhill
Copy link
Member

njhill commented Aug 6, 2024

FYI @apatke @saurabhjha1

@apatke
Copy link
Contributor

apatke commented Aug 7, 2024

Regarding Solution 2, PTAL at #6077 and let us know if you have any feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants