-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure the latest schedulerObservedGeneration if do not need to schedule #3455
Ensure the latest schedulerObservedGeneration if do not need to schedule #3455
Conversation
/assign @XiShanYongYe-Chang |
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #3455 +/- ##
==========================================
+ Coverage 51.94% 53.59% +1.65%
==========================================
Files 210 210
Lines 19077 19176 +99
==========================================
+ Hits 9910 10278 +368
+ Misses 8618 8346 -272
- Partials 549 552 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
/lgtm
Maybe we need to describe the components in the release-note. |
Done. |
/assign @RainbowMango |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root cause of this issue is that we are using patch to update the status after we updated the
.spec part which will lead generation increasing
.
I believe this patch could work around this problem, but it looks like a redundancy design. I wonder if we have another alternative solution. cc @Garrybest for help.
Another optional idea is that we need to distinguish the situation of scheduler filling in the scheduling result. |
Here is my thought:
|
I guess this fucntion could return the latest rb here but we ignore the return value. |
I agree with @Garrybest. Ensuring that the binding is the latest binding before patching status is a good solution. |
07ad4f3
to
4fcdf7b
Compare
4fcdf7b
to
3b80fe3
Compare
@@ -512,11 +512,13 @@ func (s *Scheduler) patchScheduleResultForResourceBinding(oldBinding *workv1alph | |||
return nil | |||
} | |||
|
|||
_, err = s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{}) | |||
result, err := s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{}) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this phenomenon still occur with this fix? Would you like to share your steps to reproduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to reproduce it using the steps in the issue, but it seems the generation and the schedulerObservedGeneration is the same.
@@ -512,11 +512,13 @@ func (s *Scheduler) patchScheduleResultForResourceBinding(oldBinding *workv1alph | |||
return nil | |||
} | |||
|
|||
_, err = s.KarmadaClient.WorkV1alpha2().ResourceBindings(newBinding.Namespace).Patch(context.TODO(), newBinding.Name, types.MergePatchType, patchBytes, metav1.PatchOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: restart karmada-controller-manager
will also increment the generation
of RB, but it will not enter the scheduling process at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restart karmada-controller-manager will also increment the generation of RB
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chaunceyjiang is right. Through my analyze, it's mainly because of detector. For resource templates, we add it to the queue directly for adding events. However, for update events, we only add it to queue if spec of resource templates changes for performance.
karmada/pkg/detector/detector.go
Lines 280 to 308 in ec7b3b1
func (d *ResourceDetector) OnAdd(obj interface{}) { | |
runtimeObj, ok := obj.(runtime.Object) | |
if !ok { | |
return | |
} | |
d.Processor.Enqueue(runtimeObj) | |
} | |
// OnUpdate handles object update event and push the object to queue. | |
func (d *ResourceDetector) OnUpdate(oldObj, newObj interface{}) { | |
unstructuredOldObj, err := helper.ToUnstructured(oldObj) | |
if err != nil { | |
klog.Errorf("Failed to transform oldObj, error: %v", err) | |
return | |
} | |
unstructuredNewObj, err := helper.ToUnstructured(newObj) | |
if err != nil { | |
klog.Errorf("Failed to transform newObj, error: %v", err) | |
return | |
} | |
if !SpecificationChanged(unstructuredOldObj, unstructuredNewObj) { | |
klog.V(4).Infof("Ignore update event of object (kind=%s, %s/%s) as specification no change", unstructuredOldObj.GetKind(), unstructuredOldObj.GetNamespace(), unstructuredOldObj.GetName()) | |
return | |
} | |
d.OnAdd(newObj) | |
} |
After the status is aggregated into the resource template, the resourceVersion in rb.spec.resource is inconsistent with that in the actual resource template. It's currently working without issue. However, when controller-manager restarts, detector will add all resource templates again. So we will update rb.spec.resource.resourceVersion to the latest and then generation +1. But for scheduler, it think it do not need to schedule and not enter the scheduling process. So generation and schedulerObservedGeneration is not the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, return the latest binding after patching scheduler results will not work but the former redundancy design works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the former redundancy design works.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the status is aggregated into the resource template, the resourceVersion in rb.spec.resource is inconsistent with that in the actual resource template. It's currently working without issue. However, when controller-manager restarts, detector will add all resource templates again. So we will update rb.spec.resource.resourceVersion to the latest and then generation +1.
By the way, currently the graceful-eviction-controller relies too much on this mechanism. I think the graceful-eviction-controller should also handle create event.
@RainbowMango @Poor12 @RainbowMango What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the graceful-eviction-controller should also handle create event.
What do you mean handle create event
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do something else during this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do something else during this PR?
No, This PR can solve my problem .
Signed-off-by: Poor12 <shentiecheng@huawei.com>
3b80fe3
to
9ee205f
Compare
@chaunceyjiang, could you please help verify whether this fix can solve your problem? Great thanks. |
Yes. I also used this patch in my local env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
…pstream-release-1.5 Automated cherry pick of #3455: fix generation inconsistent
What type of PR is this?
/kind bug
What this PR does / why we need it:
Ensure the latest schedulerObservedGeneration if do not need to schedule.
Which issue(s) this PR fixes:
Fixes #3454
Fixes #3467
Special notes for your reviewer:
This PR need to cherry-pick.
Does this PR introduce a user-facing change?: