Failover controllers now build eviction tasks for purgemode immediately #5881

mszacillo · 2024-11-26T04:30:56Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
Part of #5788

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Build eviction task for application failover when using purgeMode Immediately

mszacillo · 2024-11-26T04:34:19Z

pkg/controllers/applicationfailover/rb_application_failover_controller.go

+				workv1alpha2.WithPurgeMode(policyv1alpha1.Immediately),
+				workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
+				workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure),
+				workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))


I included a grace period window of 0s for the Immediately purgeMode as otherwise if the grace period is not set (which it will not be for Immediately), the default grace period will be set to 10 minutes: https://github.com/karmada-io/karmada/blob/master/cmd/controller-manager/app/options/options.go#L226

If we set GracePeriodSeconds to 0, the following effect will occur, as soon as the cluster is added to the eviction task, it will be cleaned up immediately, which may not be what we expected.

Therefore, I understand that it is OK to use the maximum wait time of 10 minutes here. For clusters with Immediately purgeMode, the logic for clearing the eviction queue can be the same as that for Graceful purgeMode. The difference lies in whether the old work is deleted immediately.

How do you think?

Makes sense. As long as we can insure that the old work is cleared immediately before we decide to reschedule, that should be okay.

mszacillo · 2024-11-26T04:34:51Z

pkg/controllers/applicationfailover/rb_application_failover_controller.go

@@ -157,8 +157,11 @@ func (c *RBApplicationFailoverController) evictBinding(binding *workv1alpha2.Res
 		switch binding.Spec.Failover.Application.PurgeMode {
 		case policyv1alpha1.Graciously:
 			if features.FeatureGate.Enabled(features.GracefulEviction) {
-				binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
-					workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))
+				binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(


Eventually this can be combined into a single case statement with Graciously + Immediately, but while the FeatureGate exists, we'll need to separate them.

Agree.

Besides, Do we also need to deal with Never's situation?

Besides, Do we also need to deal with Never's situation?

+1

I see, so in the case of Never we can generate a GracefulEvictionTask (which won't actually evict the workload), but will be used to filter out the cluster that the application is failing over from?

What are the use cases for an application never being cleaned up? Just out of curiosity.

I see, so in the case of Never we can generate a GracefulEvictionTask (which won't actually evict the workload), but will be used to filter out the cluster that the application is failing over from?

Yes, it's like this.

codecov-commenter · 2024-11-26T04:47:33Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 46.27%. Comparing base (72cfef5) to head (5872929).
Report is 8 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/util/helper/policy.go	71.42%	2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5881      +/-   ##
==========================================
+ Coverage   46.18%   46.27%   +0.08%     
==========================================
  Files         663      663              
  Lines       54592    54604      +12     
==========================================
+ Hits        25215    25267      +52     
+ Misses      27752    27712      -40     
  Partials     1625     1625

Flag	Coverage Δ
unittests	`46.27% <92.59%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

RainbowMango

/assign

XiShanYongYe-Chang

Thanks a lot~

XiShanYongYe-Chang · 2024-11-26T07:00:44Z

pkg/controllers/applicationfailover/crb_application_failover_controller.go

@@ -172,7 +176,11 @@ func (c *CRBApplicationFailoverController) evictBinding(binding *workv1alpha2.Cl
 				return err
 			}
 		case policyv1alpha1.Immediately:
-			binding.Spec.RemoveCluster(cluster)
+			binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(
+				workv1alpha2.WithPurgeMode(binding.Spec.Failover.Application.PurgeMode),


This purgeMode may need be policyv1alpha1.Immediately

@XiShanYongYe-Chang
What's the difference?

Oh, ignore me.

XiShanYongYe-Chang · 2024-11-26T08:31:12Z

pkg/controllers/applicationfailover/rb_application_failover_controller.go

+				workv1alpha2.WithPurgeMode(policyv1alpha1.Immediately),
+				workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
+				workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure),
+				workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))


If we set GracePeriodSeconds to 0, the following effect will occur, as soon as the cluster is added to the eviction task, it will be cleaned up immediately, which may not be what we expected.

Therefore, I understand that it is OK to use the maximum wait time of 10 minutes here. For clusters with Immediately purgeMode, the logic for clearing the eviction queue can be the same as that for Graceful purgeMode. The difference lies in whether the old work is deleted immediately.

How do you think?

XiShanYongYe-Chang · 2024-11-26T08:34:33Z

pkg/controllers/applicationfailover/rb_application_failover_controller.go

@@ -157,8 +157,11 @@ func (c *RBApplicationFailoverController) evictBinding(binding *workv1alpha2.Res
 		switch binding.Spec.Failover.Application.PurgeMode {
 		case policyv1alpha1.Graciously:
 			if features.FeatureGate.Enabled(features.GracefulEviction) {
-				binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(workv1alpha2.WithProducer(RBApplicationFailoverControllerName),
-					workv1alpha2.WithReason(workv1alpha2.EvictionReasonApplicationFailure), workv1alpha2.WithGracePeriodSeconds(binding.Spec.Failover.Application.GracePeriodSeconds)))
+				binding.Spec.GracefulEvictCluster(cluster, workv1alpha2.NewTaskOptions(


Agree.

Besides, Do we also need to deal with Never's situation?

Signed-off-by: mszacillo <mszacillo@bloomberg.net>

XiShanYongYe-Chang

Thanks~
/lgtm
cc @RainbowMango

RainbowMango

/approve

karmada-bot · 2024-11-27T03:11:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/controllers/OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 26, 2024

karmada-bot requested review from chaunceyjiang and mrlihanbo November 26, 2024 04:31

karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 26, 2024

mszacillo force-pushed the set-purge-mode branch from 813f011 to 5872929 Compare November 26, 2024 04:32

mszacillo commented Nov 26, 2024

View reviewed changes

RainbowMango reviewed Nov 26, 2024

View reviewed changes

karmada-bot assigned RainbowMango Nov 26, 2024

XiShanYongYe-Chang reviewed Nov 26, 2024

View reviewed changes

Failover controllers now build eviction tasks for purgemode immediately

a393653

Signed-off-by: mszacillo <mszacillo@bloomberg.net>

mszacillo force-pushed the set-purge-mode branch from 5872929 to a393653 Compare November 26, 2024 16:39

XiShanYongYe-Chang reviewed Nov 27, 2024

View reviewed changes

karmada-bot assigned XiShanYongYe-Chang Nov 27, 2024

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Nov 27, 2024

RainbowMango approved these changes Nov 27, 2024

View reviewed changes

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 27, 2024

RainbowMango added this to the v1.12 milestone Nov 27, 2024

karmada-bot merged commit f168061 into karmada-io:master Nov 27, 2024
18 checks passed

This was referenced Nov 27, 2024

[Feature] Stateful Application Failover Support #5788

Open

Random reaction time for Application Failover #4208

Closed

fix: do not reschedules the workload to the same unhealthy cluster when application failover enabled #4215

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failover controllers now build eviction tasks for purgemode immediately #5881

Failover controllers now build eviction tasks for purgemode immediately #5881

mszacillo commented Nov 26, 2024

mszacillo Nov 26, 2024

XiShanYongYe-Chang Nov 26, 2024

mszacillo Nov 26, 2024

mszacillo Nov 26, 2024

XiShanYongYe-Chang Nov 26, 2024

RainbowMango Nov 26, 2024

mszacillo Nov 26, 2024

XiShanYongYe-Chang Nov 27, 2024

codecov-commenter commented Nov 26, 2024

RainbowMango left a comment

XiShanYongYe-Chang left a comment

XiShanYongYe-Chang Nov 26, 2024

RainbowMango Nov 26, 2024

XiShanYongYe-Chang Nov 26, 2024

XiShanYongYe-Chang Nov 26, 2024

XiShanYongYe-Chang Nov 26, 2024

XiShanYongYe-Chang left a comment

RainbowMango left a comment

karmada-bot commented Nov 27, 2024

Failover controllers now build eviction tasks for purgemode immediately #5881

Failover controllers now build eviction tasks for purgemode immediately #5881

Conversation

mszacillo commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Nov 26, 2024

Codecov Report

RainbowMango left a comment

Choose a reason for hiding this comment

XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

RainbowMango left a comment

Choose a reason for hiding this comment

karmada-bot commented Nov 27, 2024