-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failover controllers now build eviction tasks for purgemode immediately #5881
Conversation
813f011
to
5872929
Compare
workv1alpha2.WithPurgeMode(policyv1alpha1.Immediately), | ||
workv1alpha2.WithProducer(RBApplicationFailoverControllerName), | ||
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), | ||
workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included a grace period window of 0s for the Immediately purgeMode as otherwise if the grace period is not set (which it will not be for Immediately), the default grace period will be set to 10 minutes: https://github.com/karmada-io/karmada/blob/master/cmd/controller-manager/app/options/options.go#L226
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set GracePeriodSeconds to 0, the following effect will occur, as soon as the cluster is added to the eviction task, it will be cleaned up immediately, which may not be what we expected.
Therefore, I understand that it is OK to use the maximum wait time of 10 minutes here. For clusters with Immediately purgeMode, the logic for clearing the eviction queue can be the same as that for Graceful purgeMode. The difference lies in whether the old work
is deleted immediately.
How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. As long as we can insure that the old work is cleared immediately before we decide to reschedule, that should be okay.
@@ -157,8 +157,11 @@ func (c *RBApplicationFailoverController) evictBinding(binding *workv1alpha2.Res | |||
switch binding.Spec.Failover.Application.PurgeMode { | |||
case policyv1alpha1.Graciously: | |||
if features.FeatureGate.Enabled(features.GracefulEviction) { | |||
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(workv1alpha2.WithProducer(RBApplicationFailoverControllerName), | |||
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds))) | |||
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually this can be combined into a single case statement with Graciously + Immediately, but while the FeatureGate exists, we'll need to separate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Besides, Do we also need to deal with Never's situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, Do we also need to deal with Never's situation?
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so in the case of Never we can generate a GracefulEvictionTask (which won't actually evict the workload), but will be used to filter out the cluster that the application is failing over from?
What are the use cases for an application never being cleaned up? Just out of curiosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so in the case of Never we can generate a GracefulEvictionTask (which won't actually evict the workload), but will be used to filter out the cluster that the application is failing over from?
Yes, it's like this.
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #5881 +/- ##
==========================================
+ Coverage 46.18% 46.27% +0.08%
==========================================
Files 663 663
Lines 54592 54604 +12
==========================================
+ Hits 25215 25267 +52
+ Misses 27752 27712 -40
Partials 1625 1625
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/assign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot~
@@ -172,7 +176,11 @@ func (c *CRBApplicationFailoverController) evictBinding(binding *workv1alpha2.Cl | |||
return err | |||
} | |||
case policyv1alpha1.Immediately: | |||
binding.Spec.RemoveCluster(cluster) | |||
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions( | |||
workv1alpha2.WithPurgeMode(binding.Spec.Failover.Application.PurgeMode), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This purgeMode may need be policyv1alpha1.Immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XiShanYongYe-Chang
What's the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ignore me.
workv1alpha2.WithPurgeMode(policyv1alpha1.Immediately), | ||
workv1alpha2.WithProducer(RBApplicationFailoverControllerName), | ||
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), | ||
workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we set GracePeriodSeconds to 0, the following effect will occur, as soon as the cluster is added to the eviction task, it will be cleaned up immediately, which may not be what we expected.
Therefore, I understand that it is OK to use the maximum wait time of 10 minutes here. For clusters with Immediately purgeMode, the logic for clearing the eviction queue can be the same as that for Graceful purgeMode. The difference lies in whether the old work
is deleted immediately.
How do you think?
@@ -157,8 +157,11 @@ func (c *RBApplicationFailoverController) evictBinding(binding *workv1alpha2.Res | |||
switch binding.Spec.Failover.Application.PurgeMode { | |||
case policyv1alpha1.Graciously: | |||
if features.FeatureGate.Enabled(features.GracefulEviction) { | |||
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(workv1alpha2.WithProducer(RBApplicationFailoverControllerName), | |||
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds))) | |||
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Besides, Do we also need to deal with Never's situation?
Signed-off-by: mszacillo <mszacillo@bloomberg.net>
5872929
to
a393653
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
/lgtm
cc @RainbowMango
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Part of #5788
Special notes for your reviewer:
Does this PR introduce a user-facing change?: