Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failover controllers now build eviction tasks for purgemode immediately #5881

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

mszacillo
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:
Part of #5788

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Build eviction task for application failover when using purgeMode Immediately

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 26, 2024
@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 26, 2024
workv1alpha2.WithPurgeMode(policyv1alpha1.Immediately),
workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure),
workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included a grace period window of 0s for the Immediately purgeMode as otherwise if the grace period is not set (which it will not be for Immediately), the default grace period will be set to 10 minutes: https://github.com/karmada-io/karmada/blob/master/cmd/controller-manager/app/options/options.go#L226

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set GracePeriodSeconds to 0, the following effect will occur, as soon as the cluster is added to the eviction task, it will be cleaned up immediately, which may not be what we expected.

Therefore, I understand that it is OK to use the maximum wait time of 10 minutes here. For clusters with Immediately purgeMode, the logic for clearing the eviction queue can be the same as that for Graceful purgeMode. The difference lies in whether the old work is deleted immediately.

How do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. As long as we can insure that the old work is cleared immediately before we decide to reschedule, that should be okay.

@@ -157,8 +157,11 @@ func (c *RBApplicationFailoverController) evictBinding(binding *workv1alpha2.Res
switch binding.Spec.Failover.Application.PurgeMode {
case policyv1alpha1.Graciously:
if features.FeatureGate.Enabled(features.GracefulEviction) {
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually this can be combined into a single case statement with Graciously + Immediately, but while the FeatureGate exists, we'll need to separate them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Besides, Do we also need to deal with Never's situation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, Do we also need to deal with Never's situation?

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so in the case of Never we can generate a GracefulEvictionTask (which won't actually evict the workload), but will be used to filter out the cluster that the application is failing over from?

What are the use cases for an application never being cleaned up? Just out of curiosity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so in the case of Never we can generate a GracefulEvictionTask (which won't actually evict the workload), but will be used to filter out the cluster that the application is failing over from?

Yes, it's like this.

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 46.27%. Comparing base (72cfef5) to head (5872929).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
pkg/util/helper/policy.go 71.42% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5881      +/-   ##
==========================================
+ Coverage   46.18%   46.27%   +0.08%     
==========================================
  Files         663      663              
  Lines       54592    54604      +12     
==========================================
+ Hits        25215    25267      +52     
+ Misses      27752    27712      -40     
  Partials     1625     1625              
Flag Coverage Δ
unittests 46.27% <92.59%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot~

@@ -172,7 +176,11 @@ func (c *CRBApplicationFailoverController) evictBinding(binding *workv1alpha2.Cl
return err
}
case policyv1alpha1.Immediately:
binding.Spec.RemoveCluster(cluster)
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(
workv1alpha2.WithPurgeMode(binding.Spec.Failover.Application.PurgeMode),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This purgeMode may need be policyv1alpha1.Immediately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XiShanYongYe-Chang
What's the difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ignore me.

workv1alpha2.WithPurgeMode(policyv1alpha1.Immediately),
workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure),
workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set GracePeriodSeconds to 0, the following effect will occur, as soon as the cluster is added to the eviction task, it will be cleaned up immediately, which may not be what we expected.

Therefore, I understand that it is OK to use the maximum wait time of 10 minutes here. For clusters with Immediately purgeMode, the logic for clearing the eviction queue can be the same as that for Graceful purgeMode. The difference lies in whether the old work is deleted immediately.

How do you think?

@@ -157,8 +157,11 @@ func (c *RBApplicationFailoverController) evictBinding(binding *workv1alpha2.Res
switch binding.Spec.Failover.Application.PurgeMode {
case policyv1alpha1.Graciously:
if features.FeatureGate.Enabled(features.GracefulEviction) {
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))
binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Besides, Do we also need to deal with Never's situation?

Signed-off-by: mszacillo <mszacillo@bloomberg.net>
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~
/lgtm
cc @RainbowMango

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2024
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 27, 2024
@RainbowMango RainbowMango added this to the v1.12 milestone Nov 27, 2024
@karmada-bot karmada-bot merged commit f168061 into karmada-io:master Nov 27, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants