-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm upgrade from v1.28.0 to v1.28.3 fails #2957
Comments
There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:
Please see the group list for a listing of the SIGs, working groups, and committees available. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
kubeadm
kubeadm
version
/transfer kubeadm |
can you share full logs here or in a github Gist maybe?
if the flag is set to false, you mean?
we have our 1.27.latest -> 1.28.latest upgrade tests working fine: but etcd is not upgraded, because there is the same version between 1.27 and 1.28:
our 1.28.latest -> 1.29.latest upgrade works as well, where actual etcd upgrade happens:
|
Hello @neolit123
You are absolutely right Here is a log
Thank you in advance |
thanks for the logs, i will try to reproduce this locally. |
i was unable to reproduce the bug. here are my steps:
relevant etcd logs from upgrade:
notice that etcd is not upgraded in my case.
this is expected as the generated manifest between .0 and .3 are the same. can you share your kubeadm cluster configuration? also if you are passing --config to |
but, we did have a major bug that was related to the etcd hash comparison, not so long ago...so it might be best if more people look at this. @chendave that was due to the mistaken import of internal defaulters. |
kubeadm
version
IIRC, the bug will be triggered when the user's cluster was already upgraded to For etcd, there is a new version |
I think there is nothing we do if the cluster is on 1.28.patch which has the change (kubernetes/kubernetes#118867) already, skip the etcd upgrade is the best choice here. |
we patched it for 1.29 here: then we backported it for 1.28 here: that was on 14 of Sept and it should be part of 1.28.3, but not in 1.28.2 if i'm reading the history of the branch 1.28 correctly: so in theory there should be no problem for the .3 upgrade. |
it sounds like a valid workaround, but it becomes a problem when we have to direct many users to a single ticket (this one). it's strange because it should not happen, and i confirmed it locally with .3. |
the ticket here is from "v1.28.0 to v1.28.3", the fix there kubernetes/kubernetes#120561 should be applied to both the initial and dest version. even v1.28.3 has the cherry-pick included, v1.28.0 still has the problematic code, so it tends to fail. |
@alexarefev another workaround is patch your etcd.yaml and remove all the defaults there, see:#2927 (comment) before upgrade. |
Hi @neolit123
init-config:
|
Hi @chendave! |
thanks, these are just default settings for etcd from the kubeadm config. related to:
yes, we were seeing these defaults but i can't see them with the official 1.28.0 binary.
@alexarefev what etcd.yaml are you getting if you create a new cluster with kubeadm init from v1.28.0? |
@neolit123 the following
|
we have exactly the same binary, but i'm getting a different etcd.yaml (minus the IP diff). we saw similar strange behavior when the bug was found. @@ -2,7 +2,7 @@ apiVersion: v1
kind: Pod
metadata:
annotations:
- kubeadm.kubernetes.io/etcd.advertise-client-urls: https://10.0.2.15:2379
+ kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.56.106:2379
creationTimestamp: null
labels:
component: etcd
@@ -13,19 +13,19 @@ spec:
containers:
- command:
- etcd
- - --advertise-client-urls=https://10.0.2.15:2379
+ - --advertise-client-urls=https://192.168.56.106:2379
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --client-cert-auth=true
- --data-dir=/var/lib/etcd
- --experimental-initial-corrupt-check=true
- --experimental-watch-progress-notify-interval=5s
- - --initial-advertise-peer-urls=https://10.0.2.15:2380
- - --initial-cluster=lubo-it=https://10.0.2.15:2380
+ - --initial-advertise-peer-urls=https://192.168.56.106:2380
+ - --initial-cluster=ubuntu=https://192.168.56.106:2380
- --key-file=/etc/kubernetes/pki/etcd/server.key
- - --listen-client-urls=https://127.0.0.1:2379,https://10.0.2.15:2379
+ - --listen-client-urls=https://127.0.0.1:2379,https://192.168.56.106:2379
- --listen-metrics-urls=http://127.0.0.1:2381
- - --listen-peer-urls=https://10.0.2.15:2380
- - --name=lubo-it
+ - --listen-peer-urls=https://192.168.56.106:2380
+ - --name=ubuntu
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
- --peer-client-cert-auth=true
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
@@ -43,6 +43,7 @@ spec:
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
+ successThreshold: 1
timeoutSeconds: 15
name: etcd
resources:
@@ -58,18 +59,26 @@ spec:
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
+ successThreshold: 1
timeoutSeconds: 15
+ terminationMessagePath: /dev/termination-log
+ terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
+ dnsPolicy: ClusterFirst
+ enableServiceLinks: true
hostNetwork: true
priority: 2000001000
priorityClassName: system-node-critical
+ restartPolicy: Always
+ schedulerName: default-scheduler
securityContext:
seccompProfile:
type: RuntimeDefault
+ terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd this means the problem might happen for some 1.28.0 users, but not for others.. thanks for the details, let's keep this tickets open until more users upgrade to 1.28.3. |
this is really a tricky issue, multiple factors are play here, golang version, kubeadm binary with the problematic code, os distro, dockerized env. etc. Sometimes we cannot reproduce it with the same binary on a specific env. you can just patch your etcd.yaml to remove the defaults, revered patching the file with #2927 (comment). unfortunately, we can just track this issue and guide end-user through the patch upgrade. @neolit123 do you think we need to post some guide somewhere to help others work through it? |
I missed this, I will update the page to include this issue. |
/assign for the doc update |
Just some of my thoughts on this, Can we have defaults included for both "init" and "dest"? the defaults is not qualified of a upgrade anyway, I did some test on my side, --- a/cmd/kubeadm/app/util/marshal.go
+++ b/cmd/kubeadm/app/util/marshal.go
@@ -33,6 +33,7 @@ import (
kubeadmapi "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm"
"k8s.io/kubernetes/cmd/kubeadm/app/constants"
+ v1 "k8s.io/kubernetes/pkg/apis/core/v1"
)
// MarshalToYaml marshals an object into yaml.
@@ -57,7 +58,7 @@ func MarshalToYamlForCodecs(obj runtime.Object, gv schema.GroupVersion, codecs s
// UniversalUnmarshal unmarshals YAML or JSON into a runtime.Object using the universal deserializer.
func UniversalUnmarshal(buffer []byte) (runtime.Object, error) {
codecs := clientsetscheme.Codecs
- decoder := codecs.UniversalDeserializer()
+ decoder := codecs.UniversalDecoder(v1.SchemeGroupVersion)
obj, _, err := decoder.Decode(buffer, nil, nil)
if err != nil {
return nil, errors.Wrapf(err, "failed to decode %s into runtime.Object", buffer) According to the doc comments for // versions of objects to return - by default, runtime.APIVersionInternal is used. If any versions are specified,
// unrecognized groups will be returned in the version they are encoded as (no conversion). This decoder performs
// defaulting.
//
// TODO: the decoder will eventually be removed in favor of dealing with objects in their versioned form
// TODO: only accept a group versioner
func (f CodecFactory) UniversalDecoder(versions ...schema.GroupVersion) runtime.Decoder { @liggitt Am I reading it wrong? shouldn't this codec generate defaults? Also, I miss the context of removing the import of "k8s.io/kubernetes/pkg/apis/core/v1" in the first place, can anyone share with me why the import is not allowed in kubeadm? is that just because of the defaulting as this issue? |
It applies defaults it knows about, which are defaulting functions registered into the codec. Whether defaulting functions are registered or not depends on which packages are linked into the binary. The defaulting functions for core APIs are defined in k8s.io/kubernetes/... API packages and only intended for use by kube-apiserver |
TL;DR there is a plan to extract components from k/k that are considered clients / out-of-tree - kubectl, kubeadm, etc. |
For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.
For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.
For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.
For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.
For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue.
/lifecycle frozen |
For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue. Signed-off-by: Dave Chen <dave.chen@arm.com>
For the isse which is reported recently: kubernetes/kubeadm#2957 We'd better to provide some tips to workaround this known issue. Signed-off-by: Dave Chen <dave.chen@arm.com>
1.28.10 is the latest. closing until further notice. |
What happened?
The following command
fails with the error:
if the
kubeadm
v1.28.3The
kubeadm
v1.28.0 upgrades cluster successfullyWhat did you expect to happen?
The
kubeadm
v1.28.3 upgrades cluster successfullyHow can we reproduce it (as minimally and precisely as possible)?
Download
kubeadm
v1.28.3 and run upgrade of Kubernetes v1.28.0Anything else we need to know?
The issue might be fixed by the
--etcd-upgrade
flag.Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: