-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD-3.5.5 : panic: failed to recover v3 backend from snapshot #14749
Comments
Hi @serathius Thanks for your prompt reply. Thanks, |
cc @ahrtr |
@IamSatyaonline previously when you ran into #14569, what did you do before upgrading to 3.5.5? Did you resolve the issue before upgrading? |
Based on the logs and timestamp, looks like the user ran into the same revision inconsistency issue which was resolved in #14733. The de-fragmentation operation was somehow terminated. @IamSatyaonline Please double confirm this and clarify why the de-fragmentation operation was terminated. FYI. https://github.com/ahrtr/etcd-issues/tree/master/issues/revision_inconsistency
|
@IamSatyaonline Please also do the following checks:
|
Hi @ahrtr Thanks, |
Could you please respond to my above two comments? |
Hi @ahrtr It's difficult for us to run the etcd-dump-logs as our deployment is not up due to the panic issue. Thanks, |
Hi @ahrtr We can't run the etcd-dump-logs in our deployment As our deployment is not up due to the panic issue. Thanks, |
Executing etcd-dump-logs doesn't require the etcd instance running. Please execute |
Hi @ahrtr Thanks, |
We do not need the deployment. Instead, we just need to data-directory of the etcd. Do you mean the data has already been removed? Or the storage supporting the dynamic volume consumed by the deployment has been removed?
Just as I mentioned in #14749 (comment), it's likely the same issue as #14733. But I need to double confirm, but unfortunately, you can't provide the info I requested.
Is it easy to reproduce this issue? |
We have cleanup the storage and all the data and reinstalled the new chart. So we don't have the snap and WAL data.
No , it's not easy to reproduce this issue. |
Thanks @IamSatyaonline for the feedback. I'd close this issue for now. Please feel free to reopen it or create a new one if you reproduce this issue again. Please keep the environment once you reproduce it. thx |
Hi @ahrtr
As you mentioned it's the same issue as 14733 , which was fixed in 3.5.5 but We are able to reproduce this issue in ETCD-3.5.5, unfortunately we don't have WAL and snap data. Do you still think that there is no issue in ETCD-3.5.5 what we have reported ? Thanks, |
The issue is fixed in 3.5.6. |
Turnout to be #14382 and fixed by using https://github.com/ahrtr/etcd-issues/blob/b221ffdee411e9dc1715d329f5e67f41366012b3/etcd/etcd-db-editor/main.go#L16-L28. I must say it is a life-saver otherwise we had to recreate the entire Kubernetes cluster.
|
What happened?
We are trying to install a helm chart with three members of ETCD service. ETCD-3.5.5 is being used in our helm chart. Our deployment is getting failed with the error below as "panic: failed to recover v3 backend from snapshot". We are using latest version of ETCD 3.5.5.
panic: failed to recover v3 backend from snapshot
goroutine 1 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0002cc000, 0xc000536640, 0x1, 0x1)
/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/go.uber.org/zap@v1.17.0/zapcore/entry.go:234 +0x58d
go.uber.org/zap.(*Logger).Panic(0xc00065a050, 0x124a82a, 0x2a, 0xc000536640, 0x1, 0x1)
/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/go.uber.org/zap@v1.17.0/logger.go:227 +0x85
go.etcd.io/etcd/server/v3/etcdserver.NewServer(0xc00006a84a, 0x26, 0x0, 0x0, 0x0, 0x0, 0xc0001499e0, 0x1, 0x1, 0xc000149c20, ...)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdserver/server.go:516 +0x1676
go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc0002c8700, 0xc00033a000, 0x0, 0x0)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/embed/etcd.go:243 +0xef8
go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0xc0002c8700, 0x121e47b, 0x6, 0xc000533101, 0x2)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:228 +0x32
go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2(0xc00003c1f0, 0x1, 0x1)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:123 +0x24da
go.etcd.io/etcd/server/v3/etcdmain.Main(0xc00003c1f0, 0x1, 0x1)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40 +0x13f
main.main()
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main.go:32 +0x45
What did you expect to happen?
There should not be any issue in ETCD service at the time of installation and ETCD should come up without any issue.
Seems, This issue has been fixed but we are facing the same issue in ETCD-3.5.5.
How can we reproduce it (as minimally and precisely as possible)?
We don't know the steps to reproduce it as it is not easy to reproduce the issue.
Anything else we need to know?
No response
Etcd version (please run commands below)
We can't share the command's result as our deployment is not coming up.
Etcd configuration (command line flags or environment variables)
ETCD_AUTO_COMPACTION_RETENTION: 100
ETCD_CERT_FILE: /run/sec/certs/server/srvcert.pem
ETCD_PEER_AUTO_TLS: true
ETCD_INITIAL_CLUSTER: etcd-0=https
ETCD_INITIAL_CLUSTER_TOKEN: etcd
ETCD_LISTEN_CLIENT_URLS: https
ETCD_MAX_SNAPSHOTS: 3
ETCD_MAX_WALS: 3
ETCD_ADVERTISE:CLIENT_URLS: https
ETCD_ENABLE_V2 false:
ETCD_KEY_FILE: /run/sec/certs/server/srvprivkey.pem
ETCD_SNAPSHOT_COUNT: 5000
ETCD_AUTO_COMPACTION_MODE: revision
ETCD_CLIENT_CERT_AUTH: true
ETCD_ELECTION_TIMEOUT: 1000
ETCD_HEARTBEAT_INTERVAL: 100
ETCD_INITIAL_CLUSTER_STATE: new
ETCD_NAME: etcd-0
ETCD_QUOTA_BACKEND_BYTES: 268435456
ETCD_TRUSTED_CA_FILE: /data/combinedca/cacertbundle.pem
ETCD_DATA_DIR: /data
ETCD_INITIAL_ADVERTISE_PEER_URLS: https
ETCD_LISTEN_PEER_URLS: https
ETCD_METRICS: basic
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
We can't share the command's result as our deployment is not coming up.
Relevant log output
The text was updated successfully, but these errors were encountered: