-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements for etcd liveness probes #2567
Comments
Adding |
I was under the impression that it does not work.
|
How to easily reproduce it with a simple script? |
I think starting a HA kind cluster, and then shutting down the containers and restarting them might be sufficient. I have a feeling it's related to kubernetes-sigs/kind#1689 |
looks like etcd 3.6 will have the missing feature that we wanted. we are in code freeze for 1.23, but this seems like something that can be added once 1.24 starts and backported to older releases (assuming it's not a big diff in kubeadm). |
It also depends on when etcd 3.6 would be released. Does etcd follow a defined timeline? Unless I am missing something, I couldn't find much with my searching the web skills. PS: |
Etcd does not have a fixed cadence for minors AFAIK. Judging from that i don't think this will align with 1.24 k8s. |
Can we ask the etcd team if they would like to cut a patch release with the said feature? |
i don't think they will agree to that, but worth a try if someone wants to do it. |
FYI. Back porting the PR to etcd 3.5, etcd-io/etcd#13706 |
cc @serathius |
@ahrtr great. If the etcd backport is accepted we can try backporting a kubeadm etcd bump. |
The backport PR to v3.5 etcd-io/etcd#13706 was merged and we may wait for etcd v3.5.3. |
Yes, the PR was merged, and I just submitted another PR pull/13725 to update the 3.5 changelog. |
etcd bump to 3.5.3 merged in 1.24 /master. here are backports for 1.23 and 1.22: after / if these merge we would want to backport and enable the new probes conditionally for kubeadm versions that use 3.5.3. |
i think what we have to do to enable the new check is the following in the kubeadm etcd manifest:
but it must be done only for k8s control plane version >= 1.22 (if the above cherry picks merge, that is) |
3.5.3 backports: |
1.25 PR is here: |
The reason is that the PR etcd/pull/13525 isn't backported to 3.5. It's a little subtle. Let's work with an example, assuming there are an etcd cluster with 3 members. There are two cases here:
Probably we need to backport the PR to 3.5 as well. But it's an enhancement, I need to discuss with other etcd maintainers. Please also let me know whether you really need it, or probably you adjust/update K8s test to adapt to case 1? |
the issue that i saw earlier was due to a mistake on my end - using
that sounds like the scenario the OP describes here.
it's unclear to me if these additional changes are needed or not. |
The key point is etcd can't finish the bootstrap/startup process if the quorum isn't satisfied. So it can't serve any client requests, even serializable requests. It is exactly what the PR 13525 fixed. But once etcd finishes the bootstrap/startup process, it can continue to serve serializable requests if the quorum isn't satisfied. |
this sounds like a good argument for the backport of etcd-io/etcd#13525 |
Let me submit a PR for the backport and get feedback from other maintainer. |
@ahrtr NOTE: it closes this issue because there isn't much else we can do here. it's following your recommendation here: |
What keywords did you search in kubeadm issues before filing this one?
This is related to kubernetes/kubernetes#96886
and etcd-io/etcd#13340
Is this a BUG REPORT or FEATURE REQUEST?
FEATURE REQUEST
Versions
kubeadm version (use
kubeadm version
): v1.22, etcd v3.5.0What happened?
Under certain cluster conditions, such as an entire cluster being powered off and on, then you may not want etcd pods restarted if no raft leader is present since leader election is taking place. There are more details in the etcd-io/etcd#13340, in which we have requested that there is either a lightweight
/ready
endpoint added to etcd or to allow/health
to take additional query parameters that would allow us to relax the constraints.Once that's in place, downstream consumers can use the JSON patch method to patch the etcd static pod (e.g. kubernetes-sigs/cluster-api#4874).
However, we may also want to change the defaults for kubeadm but we should do some modelling about what state transitions and cluster conditions we care about.
Additionally, I think we will also want to relax the consistency constraints as part of learner mode adoption.
This is mostly a tracking issue for possible changes to etcd that we can consume.
What you expected to happen?
Turned off clusters can be restarted
How to reproduce it (as minimally and precisely as possible)?
I have a feeling this is also related to kubernetes-sigs/kind#1689 , so improvements may be testable
TODOs
1.25:
[x] kubeadm: add serializable health checks for etcd probes kubernetes#110072
TODO
can be done for 1.24 (that has 3.5.3+) but we need 3.5.3 backports to happen for the older releases:
Automated cherry pick of #109471: etcd: Update to v3.5.3 kubernetes#109532
Automated cherry pick of #109471: etcd: Update to v3.5.3 kubernetes#109533
The text was updated successfully, but these errors were encountered: