-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the user experience of Failover #5150
Comments
are you saying that resources on unhealthy member clusters will be deleted cause they're migrated to other member clusters? if so, how to improve it? |
That's it. One thing to note is that it's not migrated to another cluster, it's just moved out of the failed cluster.
I hope to hear everyone's opinion. |
let me guest how it happened, no fit cluster? |
In other words, the clusters that need to be distributed have been listed, and these clusters will be distributed with configuration resources. |
does it look like this? if the cluster apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
name: test-pp
namespace: default
spec:
resourceSelectors:
- apiVersion: v1
kind: ConfigMap
name: conf
placement:
clusterAffinity:
clusterNames:
- foo
- bar
...
|
Yes, you are right. |
yes, this is a noteworthy case where we would prefer not to delete resources when there is no new cluster to migrate to. |
just had an issue that a cluster becomes not ready for about 2minutes, karmada deleted all the resources then created the resources again in that member cluster, seems that karmada failed to handle the cluster recovery properly |
@NickYadance thanks for your feedback, that's where improvement is needed. |
/assign |
Thanks for your feedback. Two minutes later, the cluster is back up? In addition, the failover feature gate is enabled by default. Is this what you expect? |
Thanks @whitewindmills, this is a feasible solution. |
Yes, the cluster is backup two minutes later. I would prefer to control the failover process manually. Something like "hey Karmada, failover the resources in member A to member B when member A is down. Or else don't do anything out of expectation." |
May i know where in the code causes the issue ? i tried to reproduce the issue offline but failed. From what i found, the failover timeout is default to 5 minutes, so the resources in old cluster shouldn't be evicted if it downs for only 2 minutes. @whitewindmills karmada/pkg/controllers/cluster/cluster_controller.go Lines 626 to 633 in 04a4d84
|
What would you like to be added:
Improve the user experience of Failover
Why is this needed:
The
Failover
andGracefulEviction
features are currently in the Beta phase, which means they are enabled by default.There is a scenario where users propagate configuration resources by directly specifying the cluster names. When a cluster is disconnected from the Karmada control plane for several hours, it is identified as
NotReady
. Once the cluster recovers, the configuration resources on that cluster are deleted unexpectedly. If this occurs in a production environment, it could lead to serious consequences.Therefore, we need to optimize the Failover feature for this scenario to provide users with a more stable and reliable experience.
The text was updated successfully, but these errors were encountered: