-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 graceful reset (etcd leave) doesn't work with KubeSpan #8057
Comments
smira
added a commit
to smira/talos
that referenced
this issue
Dec 13, 2023
Fixes siderolabs#8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
smira
added a commit
to smira/talos
that referenced
this issue
Dec 13, 2023
Fixes siderolabs#8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
smira
added a commit
to smira/talos
that referenced
this issue
Dec 14, 2023
Fixes siderolabs#8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> (cherry picked from commit 10c59a6)
smira
added a commit
to smira/talos
that referenced
this issue
Dec 14, 2023
Fixes siderolabs#8057 I went back and forth on the way to fix it exactly, and ended up with a pretty simple version of a fix. The problem was that discovery service was removing the member at the initial phase of reset, which actually still requires KubeSpan to be up: * leaving `etcd` (need to talk to other members) * stopping pods (might need to talk to Kubernetes API with some CNIs) Now leaving discovery service happens way later, when network interactions are no longer required. Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com> (cherry picked from commit 10c59a6)
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The machine leaves Discovery Service earlier than leaving
etcd
, so by the time etcd leave happens, links to the controlplane nodes are already broken, and etcd member can't communicate with other peers.The fix is to defer leaving Discovery Service on reset to some later phase, when network connectivity via KubeSpan is no longer needed.
The text was updated successfully, but these errors were encountered: