Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Member cluster healthy checking does not work #2571

Closed
alex-wong123 opened this issue Sep 23, 2022 · 19 comments · Fixed by karmada-io/website#182
Closed

Member cluster healthy checking does not work #2571

alex-wong123 opened this issue Sep 23, 2022 · 19 comments · Fixed by karmada-io/website#182
Assignees
Labels
kind/question Indicates an issue that is a support question.

Comments

@alex-wong123
Copy link

Please provide an in-depth description of the question you have:
After registering member cluster to karmada with push mode, and using "kubectl get cluster", found the cluster status was ready.
Then disconnect member by firewall, after more than 10 minutes, the cluster status was also ready, not change to fail.
Are there configurations needed for cluster heathy checking ?
What do you think about this question?:

Environment:

  • Karmada version: 1.3.0
  • Kubernetes version: 1.23.4
  • Others:
@alex-wong123 alex-wong123 added the kind/question Indicates an issue that is a support question. label Sep 23, 2022
@RainbowMango
Copy link
Member

Are there configurations needed for cluster heathy checking ?

No, Karmada would take care of the cluster status as per heart beat.

@jwcesign Could you please help to confirm this?
I guess we can try to reproduce it by following steps:

  1. launch Karmada by command: hack/local-up-karmada.sh
  2. wait for cluster status becomes ready: kubectl get clusters
  3. delete cluster member1 by command: kind delete cluster --name member1 (to simulate the network broken)
  4. wait for cluster status changes: watch kubectl get clusters

@jwcesign
Copy link
Member

jwcesign commented Sep 23, 2022

I test with release-1.3 branch, it looks work:

jw@ecs-3fa1 [03:04:27 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % export KUBECONFIG=/home/jw/.kube/karmada.config
jw@ecs-3fa1 [03:04:38 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    48s
jw-member2-push   v1.23.4   Push   True    43s
jw-member3-pull   v1.23.4   Pull   True    33s
jw@ecs-3fa1 [03:04:39 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kind get clusters
jw-karmada-host
jw-member1-push
jw-member2-push
jw-member3-pull
karmada-host
member1
member2
member3
jw@ecs-3fa1 [03:05:07 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kind delete cluster --name jw-member1-push
Deleting cluster "jw-member1-push" ...
jw@ecs-3fa1 [03:05:21 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % watch kubectl get clusters
jw@ecs-3fa1 [03:05:38 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    112s
jw-member2-push   v1.23.4   Push   True    107s
jw-member3-pull   v1.23.4   Pull   True    97s
jw@ecs-3fa1 [03:05:43 PM] [~/workspace/git/karmada-diff/karmada-official] [release-1.3 *]
-> % kubectl get clusters --watch
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    115s
jw-member2-push   v1.23.4   Push   True    110s
jw-member3-pull   v1.23.4   Pull   True    100s
jw-member1-push   v1.23.4   Push   False   2m39s
jw-member1-push   v1.23.4   Push   False   2m39s
jw-member1-push   v1.23.4   Push   False   3m9s

cc @RainbowMango

@chaunceyjiang
Copy link
Member

I had the same problem once, the cause of the problem was that the firewall did not close the already existing TCP connection.

After you start the firewall, you can use the tcpkill command to close the tcp connection.

tcpkill -9  -i ens192 src host 10.70.4.241 and dst port 6443

@RainbowMango
Copy link
Member

@jwcesign Please use v1.3.0 and try again.

@alex-wong123 As far as I remember, we didn't change the health detect behavior since v1.3.0, which means the testing of @jwcesign is believable.

Thanks @chaunceyjiang for your information, that's probably the truth :)

@jwcesign
Copy link
Member

Same result, so I think @chaunceyjiang 's answer is right.

To start using your karmada, run:
  export KUBECONFIG=/home/jw/.kube/karmada.config
Please use 'kubectl config use-context jw-karmada-host/karmada-apiserver' to switch the host and control plane cluster.

To manage your member clusters, run:
  export KUBECONFIG=/home/jw/.kube/members.config
Please use 'kubectl config use-context jw-member1-push/jw-member2-push/jw-member3-pull' to switch to the different member cluster.
jw@ecs-3fa1 [03:32:15 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kubectl get clusters
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    39s
jw-member2-push   v1.23.4   Push   True    34s
jw-member3-pull   v1.23.4   Pull   True    7s
jw@ecs-3fa1 [03:32:21 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kind get clusters
jw-karmada-host
jw-member1-push
jw-member2-push
jw-member3-pull
karmada-host
member1
member2
member3
jw@ecs-3fa1 [03:32:29 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kind delete cluster --name jw-member1-push
Deleting cluster "jw-member1-push" ...
jw@ecs-3fa1 [03:32:33 PM] [~/workspace/git/karmada-diff/karmada-official] [12e8f01d *]
-> % kubectl get clusters --watch
NAME              VERSION   MODE   READY   AGE
jw-member1-push   v1.23.4   Push   True    57s
jw-member2-push   v1.23.4   Push   True    52s
jw-member3-pull   v1.23.4   Pull   True    25s
jw-member1-push   v1.23.4   Push   False   117s
jw-member1-push   v1.23.4   Push   False   117s

@RainbowMango
Copy link
Member

So @alex-wong123 Can you help to try again according to @chaunceyjiang's recommendation above?

@alex-wong123
Copy link
Author

Thanks for all the replies, I'll try according to @chaunceyjiang's recommendation.

@alex-wong123
Copy link
Author

Thanks everyone, it works according to @chaunceyjiang's recommendation.

@chaunceyjiang
Copy link
Member

Hi, @RainbowMango @alex-wong123 I suggest that we should incubate a Known issues. like metrics-server KNOWN_ISSUES.
The current issue is a good example.

@RainbowMango
Copy link
Member

Yes!!
Where should we put it, any suggestions?

By the way, what's the difference between FAQ and known-issues?

@alex-wong123
Copy link
Author

@chaunceyjiang Good idea

@RainbowMango
Copy link
Member

cc @Poor12 for suggestions.

@alex-wong123
Copy link
Author

Yes!! Where should we put it, any suggestions?

By the way, what's the difference between FAQ and known-issues?

My personal opinions that FAQ is generally about concepts, known-issues are related issues encountered in using.

@chaunceyjiang
Copy link
Member

My personal opinions that FAQ is generally about concepts, known-issues are related issues encountered in using.

+1

Maybe these are a good reference.

kind:
https://kind.sigs.k8s.io/docs/user/known-issues/
https://github.com/kubernetes-sigs/kind/blob/main/site/content/docs/user/known-issues.md

metrics-server:
https://github.com/kubernetes-sigs/metrics-server/blob/master/KNOWN_ISSUES.md

metallb
https://metallb.universe.tf/configuration/calico/

@Poor12
Copy link
Member

Poor12 commented Sep 23, 2022

Yeah, I suggest to put it in https://karmada.io/docs/troubleshooting/. We can provide a list and record the corresponding evasion methods, just like @chaunceyjiang mentioned.

@RainbowMango
Copy link
Member

troubleshooting sounds good to me.
@chaunceyjiang how do you think? and would like to send a PR for this?

@chaunceyjiang
Copy link
Member

ok

@RainbowMango
Copy link
Member

/reopen
/assign @chaunceyjiang
Thanks.

@karmada-bot
Copy link
Collaborator

@RainbowMango: Reopened this issue.

In response to this:

/reopen
/assign @chaunceyjiang
Thanks.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Indicates an issue that is a support question.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants