Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid accidental impact for WeightPreference on the cache #3393

Merged
merged 1 commit into from
Apr 18, 2023

Conversation

whitewindmills
Copy link
Member

@whitewindmills whitewindmills commented Apr 11, 2023

What type of PR is this?
/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #3361

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-scheduler`: Fixed unexpected re-scheduling due to mutating informer cache issue.

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 11, 2023
@karmada-bot karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 11, 2023
@codecov-commenter
Copy link

codecov-commenter commented Apr 11, 2023

Codecov Report

Merging #3393 (5bb8eae) into master (ea128f9) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #3393      +/-   ##
==========================================
- Coverage   51.78%   51.76%   -0.03%     
==========================================
  Files         210      210              
  Lines       18928    18932       +4     
==========================================
- Hits         9802     9800       -2     
- Misses       8584     8590       +6     
  Partials      542      542              
Flag Coverage Δ
unittests 51.76% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/scheduler/scheduler.go 18.10% <0.00%> (-0.13%) ⬇️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Poor12
Copy link
Member

Poor12 commented Apr 11, 2023

Nice job!
/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2023
@XiShanYongYe-Chang
Copy link
Member

Hi @whitewindmills thanks, can you help add a release-note, maybe we need to cherry-pick this patch to the previous branch.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it to my queue and will try to review it tomorrow.

@whitewindmills
Copy link
Member Author

Hi @whitewindmills thanks, can you help add a release-note, maybe we need to cherry-pick this patch to the previous branch.

Thanks for your reminder.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this patch fixes the bug exactly, but what I'm thinking is if a similar issue(mutating cache) exists somewhere else, how can we eliminate this problem fundamentally?

Just revisited the relevant code, the whole scheduling process uses a pointer to the cache, that's very scary.

So, I tend to deep copy it immediately after RB/CRB is listed from the cache. I know this approach might be more expensive since we are going to copy a whole object instead of one field, but I still think we should do it.

@whitewindmills @Garrybest What's your opinion?

The following pieces of code just explain my idea.

diff --git a/pkg/scheduler/scheduler.go b/pkg/scheduler/scheduler.go
index e372c695..e2561bce 100644
--- a/pkg/scheduler/scheduler.go
+++ b/pkg/scheduler/scheduler.go
@@ -298,6 +298,7 @@ func (s *Scheduler) doScheduleBinding(namespace, name string) (err error) {
                }
                return err
        }
+       rb = rb.DeepCopy()

        if rb.Spec.Placement == nil {
                // never reach here
@@ -345,6 +346,7 @@ func (s *Scheduler) doScheduleClusterBinding(name string) (err error) {
                }
                return err
        }
+       crb = crb.DeepCopy()

        if crb.Spec.Placement == nil {
                // never reach here

@@ -70,7 +70,8 @@ func newAssignState(candidates []*clusterv1alpha1.Cluster, placement *policyv1al
}
}

return &assignState{candidates: candidates, strategy: placement.ReplicaScheduling, spec: obj, strategyType: strategyType}
// Use ReplicaScheduling's copy to avoid accidental impact on the cache.
return &assignState{candidates: candidates, strategy: placement.ReplicaScheduling.DeepCopy(), spec: obj, strategyType: strategyType}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely a smart fix.

@whitewindmills
Copy link
Member Author

@RainbowMango
I agree with your concerns. But I don't have a better way for now, if we do this, I think we need to test the memory consumption.

@Garrybest
Copy link
Member

+1, firstly thank @whitewindmills for finding this!

The kube-scheduler copy a pod from cache when doing a scheduling process every time as well. Here I think safety is more important than performance. Moreover, this operation is not very time-consuming.

@Poor12
Copy link
Member

Poor12 commented Apr 13, 2023

+1, I agree with it.

@whitewindmills
Copy link
Member Author

Now that we all agree with this approach, I will update this PR.

@RainbowMango
Copy link
Member

I updated the release note, by the way.

Signed-off-by: whitewindmills <jayfantasyhjh@gmail.com>
@karmada-bot karmada-bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 18, 2023
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@whitewindmills Can you help to cherry-pick this to release branches(release-1.5, release-1.4, release-1.3)?

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 18, 2023
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2023
@whitewindmills
Copy link
Member Author

Yeah, I'll do that.

karmada-bot added a commit that referenced this pull request Apr 19, 2023
…-#3393-upstream-release-1.5

Automated cherry pick of #3393: Avoid accidental impact for rb/crb's pointer on the cache
@RainbowMango RainbowMango added this to the v1.6 milestone May 16, 2023
@whitewindmills whitewindmills deleted the pointer branch May 31, 2023 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The logic of karmada-scheduler judging whether the placement has changed could be optimized
7 participants