You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default scheduler is functioning well for the basic use case of serving with maximum throughput.
There are still some use cases in which we prioritize other metrics before maximum throughput, for example maintaining fairness between different users.
I specifically have a use case in which I have an application that uses vLLM, and tries to maintain fairness between requests of different users of the application.
By making the scheduler component more abstract and replaceable (perhaps also pluginable) we can allow such use case without having to change the scheduler logic to support each of these use cases.
Proposed Change.
I propose 2 different solutions, one of which may be hard to implement, but allows anyone to implement any scheduling logic they wish without changing any other core logic. The other is simple to implement but doesn't allow full control of the scheduler logic, and the other may be harder to implement but .
Solution 1 - Scheduler plugins
This solution requires defining an abstract base class of a scheduler, and allowing to pass the desired scheduler implementation file path as a CLI argument (or an environment variable).
This idea could also serve as the basis of scheduler plugins - meaning anyone could implement their own scheduler as a package separate from core vLLM, which allows for great extensibility and modularity.
Solution 2 - Support voluntary preemption hooks
This solution is less flexible but should still allow support for most scheduling logic.
This solution means that the Scheduler class should expose public methods for preempt/suspend and resume a SequenceGroup, and then the API can add routes to expose these methods.
This way we allow applications wrapping vLLM to implement their own complex scheduling logic, to give each user it's fair share of scheduling, or any other desired scheduling logic.
Feedback Period.
No response
CC List.
No response
Any Other Things.
Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.
The text was updated successfully, but these errors were encountered:
Motivation.
The default scheduler is functioning well for the basic use case of serving with maximum throughput.
There are still some use cases in which we prioritize other metrics before maximum throughput, for example maintaining fairness between different users.
I specifically have a use case in which I have an application that uses vLLM, and tries to maintain fairness between requests of different users of the application.
By making the scheduler component more abstract and replaceable (perhaps also pluginable) we can allow such use case without having to change the scheduler logic to support each of these use cases.
Proposed Change.
I propose 2 different solutions, one of which may be hard to implement, but allows anyone to implement any scheduling logic they wish without changing any other core logic. The other is simple to implement but doesn't allow full control of the scheduler logic, and the other may be harder to implement but .
Solution 1 - Scheduler plugins
This solution requires defining an abstract base class of a scheduler, and allowing to pass the desired scheduler implementation file path as a CLI argument (or an environment variable).
This idea could also serve as the basis of scheduler plugins - meaning anyone could implement their own scheduler as a package separate from core vLLM, which allows for great extensibility and modularity.
Solution 2 - Support voluntary preemption hooks
This solution is less flexible but should still allow support for most scheduling logic.
This solution means that the
Scheduler
class should expose public methods forpreempt
/suspend
andresume
aSequenceGroup
, and then the API can add routes to expose these methods.This way we allow applications wrapping vLLM to implement their own complex scheduling logic, to give each user it's fair share of scheduling, or any other desired scheduling logic.
Feedback Period.
No response
CC List.
No response
Any Other Things.
Just to make it clear, I'll be happy to implement this, but I want hear some feedback before I go ahead and implement this.
The text was updated successfully, but these errors were encountered: