Remove equivalence cache from the scheduler code base #71013

bsalamat · 2018-11-14T00:07:14Z

What would you like to be added:
Remove equivalence cache from the scheduler code base.

Why is this needed:
The equivalence cache (eCache) was added to the scheduler as a mechanism to improve performance of running predicate functions. The equivalence cache stores the results of predicates for pods and as long as conditions of a node are not changed, it uses the cached results for pods which have the same scheduling requirements.
While on paper, this should have improved performance of the scheduler significantly, in practice it slowed the scheduler down for many common scenarios. The reason turned out to be lock contention, and accessing a three-level cache which was sometime slower than running the predicates themselves.
We then tried to optimize the locking mechanism. It helped improve performance over the previous implementation, but it was still causing slow down compared to the scheduler without eCache, for pods that didn't have complex scheduling requirements. It improved performance for pods with inter-pod affinity/anti-affinity though. This was before we added further optimizations that improved performance of affinity/anti-affinity by 5x. So, performance improvements for affinity/anti-affinity is much smaller now, but it is still significant enough to consider having eCache. However, it turned out that eCache complicates our code base much and invalidating the eCache at various events turns out to be error prone and makes building some of the new scheduling features harder. For example, ensuring that dynamic volume binding works with eCache proved to be non-trivial. As a result we have decided to remove the current implementation of eCache.

Our plan is to redesign the equivalence cache with a different mechanism to ensure that the scheduler does not keep retrying scheduling a large number of equivalent pods after it finds one of them unschedulable. When one pod is determined unschedulable, all other equivalent pods will be unschedulable as well. So, the scheduler can save CPU cycles and try other pods.

/kind cleanup

/sig scheduling

bsalamat · 2018-11-14T00:09:22Z

@resouer do you have cycles to work on this issue?

Huang-Wei · 2018-11-14T00:12:19Z

I suppose cleaning up current eCache codebase is a prerequisite of @resouer's new eCache implementation.

(but if I misunderstand, I have buffers to work on the cleanup)

bsalamat · 2018-11-14T06:06:38Z

@Huang-Wei if you have the bandwidth, I will happily assign it to you, but let's wait a bit for Harry to respond as well.

Huang-Wei · 2018-11-14T19:32:29Z

/assign @resouer

for commenting.

resouer · 2018-11-16T01:32:00Z

Thanks @Huang-Wei ! Actually I've set up a tracking issue for the new design of equivalence class and the clean up of old code is the first task. So let me handle it.

Huang-Wei · 2018-11-16T01:33:52Z

Sure, that's totally fine, I suppose that's a pre-task for your implementation.

bsalamat mentioned this issue Nov 14, 2018

Disable equivalence cache by default in the scheduler integration tests #71003

Merged

k8s-ci-robot assigned resouer Nov 14, 2018

This was referenced Nov 20, 2018

Revisit and narrow down functionality of equivelance class cache #68720

Closed

Eclass task 1: clean up old code #71399

Merged

k8s-ci-robot closed this as completed in #71399 Dec 18, 2018

lowang-bh mentioned this issue May 25, 2024

to add a feasible nodes cache for same type of pods in same workload #124949

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove equivalence cache from the scheduler code base #71013

Remove equivalence cache from the scheduler code base #71013

bsalamat commented Nov 14, 2018 •

edited

Loading

bsalamat commented Nov 14, 2018

Huang-Wei commented Nov 14, 2018

bsalamat commented Nov 14, 2018

Huang-Wei commented Nov 14, 2018

resouer commented Nov 16, 2018

Huang-Wei commented Nov 16, 2018

Remove equivalence cache from the scheduler code base #71013

Remove equivalence cache from the scheduler code base #71013

Comments

bsalamat commented Nov 14, 2018 • edited Loading

bsalamat commented Nov 14, 2018

Huang-Wei commented Nov 14, 2018

bsalamat commented Nov 14, 2018

Huang-Wei commented Nov 14, 2018

resouer commented Nov 16, 2018

Huang-Wei commented Nov 16, 2018

bsalamat commented Nov 14, 2018 •

edited

Loading