-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove equivalence cache from the scheduler code base #71013
Comments
@resouer do you have cycles to work on this issue? |
I suppose cleaning up current eCache codebase is a prerequisite of @resouer's new eCache implementation. (but if I misunderstand, I have buffers to work on the cleanup) |
@Huang-Wei if you have the bandwidth, I will happily assign it to you, but let's wait a bit for Harry to respond as well. |
/assign @resouer for commenting. |
Thanks @Huang-Wei ! Actually I've set up a tracking issue for the new design of equivalence class and the clean up of old code is the first task. So let me handle it. |
Sure, that's totally fine, I suppose that's a pre-task for your implementation. |
What would you like to be added:
Remove equivalence cache from the scheduler code base.
Why is this needed:
The equivalence cache (eCache) was added to the scheduler as a mechanism to improve performance of running predicate functions. The equivalence cache stores the results of predicates for pods and as long as conditions of a node are not changed, it uses the cached results for pods which have the same scheduling requirements.
While on paper, this should have improved performance of the scheduler significantly, in practice it slowed the scheduler down for many common scenarios. The reason turned out to be lock contention, and accessing a three-level cache which was sometime slower than running the predicates themselves.
We then tried to optimize the locking mechanism. It helped improve performance over the previous implementation, but it was still causing slow down compared to the scheduler without eCache, for pods that didn't have complex scheduling requirements. It improved performance for pods with inter-pod affinity/anti-affinity though. This was before we added further optimizations that improved performance of affinity/anti-affinity by 5x. So, performance improvements for affinity/anti-affinity is much smaller now, but it is still significant enough to consider having eCache. However, it turned out that eCache complicates our code base much and invalidating the eCache at various events turns out to be error prone and makes building some of the new scheduling features harder. For example, ensuring that dynamic volume binding works with eCache proved to be non-trivial. As a result we have decided to remove the current implementation of eCache.
Our plan is to redesign the equivalence cache with a different mechanism to ensure that the scheduler does not keep retrying scheduling a large number of equivalent pods after it finds one of them unschedulable. When one pod is determined unschedulable, all other equivalent pods will be unschedulable as well. So, the scheduler can save CPU cycles and try other pods.
/kind cleanup
/sig scheduling
The text was updated successfully, but these errors were encountered: