-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid accidental impact for WeightPreference on the cache #3393
Conversation
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #3393 +/- ##
==========================================
- Coverage 51.78% 51.76% -0.03%
==========================================
Files 210 210
Lines 18928 18932 +4
==========================================
- Hits 9802 9800 -2
- Misses 8584 8590 +6
Partials 542 542
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Nice job! |
Hi @whitewindmills thanks, can you help add a release-note, maybe we need to cherry-pick this patch to the previous branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add it to my queue and will try to review it tomorrow.
Thanks for your reminder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this patch fixes the bug exactly, but what I'm thinking is if a similar issue(mutating cache) exists somewhere else, how can we eliminate this problem fundamentally?
Just revisited the relevant code, the whole scheduling process uses a pointer to the cache, that's very scary.
So, I tend to deep copy
it immediately after RB/CRB is listed from the cache. I know this approach might be more expensive since we are going to copy a whole object instead of one field, but I still think we should do it.
@whitewindmills @Garrybest What's your opinion?
The following pieces of code just explain my idea.
diff --git a/pkg/scheduler/scheduler.go b/pkg/scheduler/scheduler.go
index e372c695..e2561bce 100644
--- a/pkg/scheduler/scheduler.go
+++ b/pkg/scheduler/scheduler.go
@@ -298,6 +298,7 @@ func (s *Scheduler) doScheduleBinding(namespace, name string) (err error) {
}
return err
}
+ rb = rb.DeepCopy()
if rb.Spec.Placement == nil {
// never reach here
@@ -345,6 +346,7 @@ func (s *Scheduler) doScheduleClusterBinding(name string) (err error) {
}
return err
}
+ crb = crb.DeepCopy()
if crb.Spec.Placement == nil {
// never reach here
pkg/scheduler/core/assignment.go
Outdated
@@ -70,7 +70,8 @@ func newAssignState(candidates []*clusterv1alpha1.Cluster, placement *policyv1al | |||
} | |||
} | |||
|
|||
return &assignState{candidates: candidates, strategy: placement.ReplicaScheduling, spec: obj, strategyType: strategyType} | |||
// Use ReplicaScheduling's copy to avoid accidental impact on the cache. | |||
return &assignState{candidates: candidates, strategy: placement.ReplicaScheduling.DeepCopy(), spec: obj, strategyType: strategyType} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely a smart fix.
@RainbowMango |
+1, firstly thank @whitewindmills for finding this! The kube-scheduler copy a pod from cache when doing a scheduling process every time as well. Here I think safety is more important than performance. Moreover, this operation is not very time-consuming. |
+1, I agree with it. |
Now that we all agree with this approach, I will update this PR. |
I updated the release note, by the way. |
Signed-off-by: whitewindmills <jayfantasyhjh@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
@whitewindmills Can you help to cherry-pick this to release branches(release-1.5, release-1.4, release-1.3)?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Yeah, I'll do that. |
…-#3393-upstream-release-1.5 Automated cherry pick of #3393: Avoid accidental impact for rb/crb's pointer on the cache
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #3361
Special notes for your reviewer:
Does this PR introduce a user-facing change?: