ETCD-3.5.5 : panic: failed to recover v3 backend from snapshot #14749

IamSatyaonline · 2022-11-14T12:04:14Z

What happened?

We are trying to install a helm chart with three members of ETCD service. ETCD-3.5.5 is being used in our helm chart. Our deployment is getting failed with the error below as "panic: failed to recover v3 backend from snapshot". We are using latest version of ETCD 3.5.5.

panic: failed to recover v3 backend from snapshot
goroutine 1 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0002cc000, 0xc000536640, 0x1, 0x1)
/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/go.uber.org/zap@v1.17.0/zapcore/entry.go:234 +0x58d
go.uber.org/zap.(*Logger).Panic(0xc00065a050, 0x124a82a, 0x2a, 0xc000536640, 0x1, 0x1)
/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/go.uber.org/zap@v1.17.0/logger.go:227 +0x85
go.etcd.io/etcd/server/v3/etcdserver.NewServer(0xc00006a84a, 0x26, 0x0, 0x0, 0x0, 0x0, 0xc0001499e0, 0x1, 0x1, 0xc000149c20, ...)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdserver/server.go:516 +0x1676
go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc0002c8700, 0xc00033a000, 0x0, 0x0)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/embed/etcd.go:243 +0xef8
go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0xc0002c8700, 0x121e47b, 0x6, 0xc000533101, 0x2)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:228 +0x32
go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2(0xc00003c1f0, 0x1, 0x1)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:123 +0x24da
go.etcd.io/etcd/server/v3/etcdmain.Main(0xc00003c1f0, 0x1, 0x1)
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40 +0x13f
main.main()
/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main.go:32 +0x45

What did you expect to happen?

There should not be any issue in ETCD service at the time of installation and ETCD should come up without any issue.
Seems, This issue has been fixed but we are facing the same issue in ETCD-3.5.5.

How can we reproduce it (as minimally and precisely as possible)?

We don't know the steps to reproduce it as it is not easy to reproduce the issue.

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

We can't share the command's result as our deployment is not coming up.

Etcd configuration (command line flags or environment variables)

ETCD_AUTO_COMPACTION_RETENTION: 100
ETCD_CERT_FILE: /run/sec/certs/server/srvcert.pem
ETCD_PEER_AUTO_TLS: true
ETCD_INITIAL_CLUSTER: etcd-0=https
ETCD_INITIAL_CLUSTER_TOKEN: etcd
ETCD_LISTEN_CLIENT_URLS: https
ETCD_MAX_SNAPSHOTS: 3
ETCD_MAX_WALS: 3
ETCD_ADVERTISE:CLIENT_URLS: https
ETCD_ENABLE_V2 false:
ETCD_KEY_FILE: /run/sec/certs/server/srvprivkey.pem
ETCD_SNAPSHOT_COUNT: 5000
ETCD_AUTO_COMPACTION_MODE: revision
ETCD_CLIENT_CERT_AUTH: true
ETCD_ELECTION_TIMEOUT: 1000
ETCD_HEARTBEAT_INTERVAL: 100
ETCD_INITIAL_CLUSTER_STATE: new
ETCD_NAME: etcd-0
ETCD_QUOTA_BACKEND_BYTES: 268435456
ETCD_TRUSTED_CA_FILE: /data/combinedca/cacertbundle.pem
ETCD_DATA_DIR: /data
ETCD_INITIAL_ADVERTISE_PEER_URLS: https
ETCD_LISTEN_PEER_URLS: https
ETCD_METRICS: basic

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

We can't share the command's result as our deployment is not coming up.

Relevant log output

2022-11-08T23:17:38.073+00:00 | info | The current timezone is UTC
2022-11-08T23:17:38.364+00:00 | info | getopts cs option
2022-11-08T23:17:38.364+00:00 | info | case "${option}" in
2022-11-08T23:17:38.364+00:00 | info | CHECK=true
2022-11-08T23:17:38.364+00:00 | info | getopts cs option
2022-11-08T23:17:38.364+00:00 | info | case "${option}" in
2022-11-08T23:17:38.364+00:00 | info | SUICIDE=true
2022-11-08T23:17:38.364+00:00 | info | getopts cs option
2022-11-08T23:17:38.364+00:00 | info | mkdir -p /data/combinedca/
2022-11-08T23:17:38.364+00:00 | info | [[ 0 == 0 ]]
2022-11-08T23:17:38.364+00:00 | info | [[ ! -f /run/sec/cas/bootstrap/cacertbundle.pem ]]
2022-11-08T23:17:38.364+00:00 | info | [[ ! -f /run/sec/cas/clientca/client-cacertbundle.pem ]]
2022-11-08T23:17:38.364+00:00 | info | [[ -v SUICIDE ]]
22022-11-08T23:17:38.364+00:00 | info | ps -ef
22022-11-08T23:17:38.364+00:00 | info | grep watch_cert
22022-11-08T23:17:38.364+00:00 | info | grep -v grep
2022-11-08T23:17:38.364+00:00 | info | [[ -z '' ]]
2022-11-08T23:17:38.364+00:00 | info | rm -f /data/combinedca/cacertbundle.pem
2022-11-08T23:17:38.364+00:00 | info | nohup /usr/local/bin/scripts/watch_cert.sh /run/sec/cas/clientca /run/sec/cas/pmca /run/sec/cas/siptlsca
2022-11-08T23:17:38.364+00:00 | info | [[ ! -s /data/combinedca/cacertbundle.pem ]]
2022-11-08T23:17:38.364+00:00 | info | awk 1 /run/sec/cas/clientca/client-cacertbundle.pem /run/sec/cas/siptlsca/cacertbundle.pem
2022-11-08T23:17:38.364+00:00 | info | [[ ! -s /data/combinedca/cacertbundle.pem ]]
2022-11-08T23:17:38.364+00:00 | info | [[ ! -s /data/combinedca/cacertbundle.pem ]]
2022-11-08T23:17:38.364+00:00 | info | [[ -v CHECK ]]
2022-11-08T23:17:38.364+00:00 | info | [[ ! -f /data/combinedca/cacertbundle.pem ]]
Is bootstraping ? false
Setup peer certs for etcd-0 
Auto TLS enable , using ETCD auto generated certs for peer communication
2022-11-08T23:17:38.498+00:00 | info | [ /run/sec/cas/clientca ] Caught Certificate renewed event.
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_ADVERTISE_CLIENT_URLS","variable-value":"https://etcd-0.etcd.idunossautoapp01:2379"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_AUTO_COMPACTION_MODE","variable-value":"revision"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_AUTO_COMPACTION_RETENTION","variable-value":"100"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_CERT_FILE","variable-value":"/run/sec/certs/server/srvcert.pem"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_CLIENT_CERT_AUTH","variable-value":"true"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_DATA_DIR","variable-value":"/data"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_ELECTION_TIMEOUT","variable-value":"1000"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_HEARTBEAT_INTERVAL","variable-value":"100"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_ADVERTISE_PEER_URLS","variable-value":"https://etcd-0.etcd-peer.idunossautoapp01.svc.cluster.local:2380"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"etcd-0=https://etcd-0.etcd-peer.idunossautoapp01.svc.cluster.local:2380"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"new"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_TOKEN","variable-value":"etcd"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_KEY_FILE","variable-value":"/run/sec/certs/server/srvprivkey.pem"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LISTEN_CLIENT_URLS","variable-value":"https://0.0.0.0:2379"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LISTEN_PEER_URLS","variable-value":"https://0.0.0.0:2380"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LOG_LEVEL","variable-value":"info"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LOGGER","variable-value":"zap"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_MAX_SNAPSHOTS","variable-value":"3"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_MAX_WALS","variable-value":"3"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_METRICS","variable-value":"basic"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"etcd-0"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_PEER_AUTO_TLS","variable-value":"true"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_QUOTA_BACKEND_BYTES","variable-value":"268435456"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_SNAPSHOT_COUNT","variable-value":"5000"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.505Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_TRUSTED_CA_FILE","variable-value":"/data/combinedca/cacertbundle.pem"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.506Z","caller":"etcdmain/etcd.go:73","message":"Running: ","args":["/usr/local/bin/etcd"]}
{"severity":"warning","timestamp":"2022-11-08T23:17:38.506Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"auth_successful","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:38.536Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"combinedca","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:38.536Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"etcd.liveness","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:38.536Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"fixtures","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:38.536Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"lost+found","data-dir":"/data"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.537Z","caller":"etcdmain/etcd.go:116","message":"server has been already initialized","data-dir":"/data","dir-type":"member"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.537Z","caller":"embed/etcd.go:124","message":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]}
{"severity":"warning","timestamp":1667949458.5372853,"caller":"fileutil/fileutil.go:57","message":"check file permission","error":"directory \"/data/fixtures/peer\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.537Z","caller":"embed/etcd.go:482","message":"starting with peer TLS","tls-info":"cert = /data/fixtures/peer/cert.pem, key = /data/fixtures/peer/key.pem, client-cert=/data/fixtures/peer/cert.pem, client-key=/data/fixtures/peer/key.pem, trusted-ca = , client-cert-auth = false, crl-file = ","cipher-suites":[]}
{"severity":"info","timestamp":"2022-11-08T23:17:38.560Z","caller":"embed/etcd.go:132","message":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
2022-11-08T23:17:38.588+00:00 | info | Restarting ETCD while reloading certificates
2022-11-08T23:17:38.638+00:00 | info | Killing ETCD process
{"severity":"info","timestamp":"2022-11-08T23:17:38.560Z","caller":"embed/etcd.go:306","message":"starting an etcd server","etcd-version":"3.5.5","git-sha":"19002cfc6","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"etcd-0","data-dir":"/data","wal-dir":"","wal-dir-dedicated":"","member-dir":"/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":5000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://etcd-0.etcd-peer.idunossautoapp01.svc.cluster.local:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://etcd-0.etcd.idunossautoapp01:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":268435456,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"revision","auto-compaction-retention":"100ns","auto-compaction-interval":"100ns","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"severity":"warning","timestamp":1667949458.5611992,"caller":"fileutil/fileutil.go:57","message":"check file permission","error":"directory \"/data\" exist, but the permission is \"dgrwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"severity":"warning","timestamp":1667949458.5615969,"caller":"fileutil/fileutil.go:57","message":"check file permission","error":"directory \"/data/member/snap\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"severity":"info","timestamp":"2022-11-08T23:17:38.561Z","caller":"etcdserver/backend.go:81","message":"opened backend db","path":"/data/member/snap/db","took":"323.992µs"}
/usr/local/bin/scripts/entrypoint.sh: line 63: kill: (8) - No such process
2022-11-08T23:17:43.108+00:00 | info | Killing the ETCD process
bash: line 1:    33 Terminated              /usr/local/bin/etcdctl defrag --endpoints=:${CLIENT_PORTS} --insecure-skip-tls-verify
Restarting ETCD
2022-11-08T23:17:43.239+00:00 | info | getopts cs option
2022-11-08T23:17:43.239+00:00 | info | case "${option}" in
2022-11-08T23:17:43.239+00:00 | info | CHECK=true
2022-11-08T23:17:43.239+00:00 | info | getopts cs option
2022-11-08T23:17:43.239+00:00 | info | case "${option}" in
2022-11-08T23:17:43.239+00:00 | info | SUICIDE=true
2022-11-08T23:17:43.239+00:00 | info | getopts cs option
2022-11-08T23:17:43.239+00:00 | info | mkdir -p /data/combinedca/
2022-11-08T23:17:43.239+00:00 | info | [[ 0 == 0 ]]
2022-11-08T23:17:43.239+00:00 | info | [[ ! -f /run/sec/cas/bootstrap/cacertbundle.pem ]]
2022-11-08T23:17:43.239+00:00 | info | [[ ! -f /run/sec/cas/clientca/client-cacertbundle.pem ]]
2022-11-08T23:17:43.239+00:00 | info | [[ -v SUICIDE ]]
22022-11-08T23:17:43.239+00:00 | info | ps -ef
22022-11-08T23:17:43.239+00:00 | info | grep watch_cert
22022-11-08T23:17:43.239+00:00 | info | grep -v grep
2022-11-08T23:17:43.239+00:00 | info | [[ -z 250422      49     1  0 23:17 ?        00:00:00 /bin/bash /usr/local/bin/scripts/watch_cert.sh /run/sec/cas/clientca /run/sec/cas/pmca /run/sec/cas/siptlsca ]]
2022-11-08T23:17:43.239+00:00 | info | rm -f /data/combinedca/cacertbundle.pem
2022-11-08T23:17:43.239+00:00 | info | [[ ! -s /data/combinedca/cacertbundle.pem ]]
2022-11-08T23:17:43.239+00:00 | info | awk 1 /run/sec/cas/clientca/client-cacertbundle.pem /run/sec/cas/siptlsca/cacertbundle.pem
2022-11-08T23:17:43.239+00:00 | info | [[ ! -s /data/combinedca/cacertbundle.pem ]]
2022-11-08T23:17:43.239+00:00 | info | [[ ! -s /data/combinedca/cacertbundle.pem ]]
2022-11-08T23:17:43.239+00:00 | info | [[ -v CHECK ]]
2022-11-08T23:17:43.239+00:00 | info | [[ ! -f /data/combinedca/cacertbundle.pem ]]
Is bootstraping ? false
Setup peer certs for etcd-0 
Auto TLS enable , using ETCD auto generated certs for peer communication
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_ADVERTISE_CLIENT_URLS","variable-value":"https://etcd-0.etcd.idunossautoapp01:2379"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_AUTO_COMPACTION_MODE","variable-value":"revision"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_AUTO_COMPACTION_RETENTION","variable-value":"100"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_CERT_FILE","variable-value":"/run/sec/certs/server/srvcert.pem"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_CLIENT_CERT_AUTH","variable-value":"true"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_DATA_DIR","variable-value":"/data"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_ELECTION_TIMEOUT","variable-value":"1000"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_HEARTBEAT_INTERVAL","variable-value":"100"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_ADVERTISE_PEER_URLS","variable-value":"https://etcd-0.etcd-peer.idunossautoapp01.svc.cluster.local:2380"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER","variable-value":"etcd-0=https://etcd-0.etcd-peer.idunossautoapp01.svc.cluster.local:2380"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_STATE","variable-value":"new"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_INITIAL_CLUSTER_TOKEN","variable-value":"etcd"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.292Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_KEY_FILE","variable-value":"/run/sec/certs/server/srvprivkey.pem"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LISTEN_CLIENT_URLS","variable-value":"https://0.0.0.0:2379"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LISTEN_PEER_URLS","variable-value":"https://0.0.0.0:2380"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LOG_LEVEL","variable-value":"info"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_LOGGER","variable-value":"zap"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_MAX_SNAPSHOTS","variable-value":"3"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_MAX_WALS","variable-value":"3"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_METRICS","variable-value":"basic"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_NAME","variable-value":"etcd-0"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_PEER_AUTO_TLS","variable-value":"true"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_QUOTA_BACKEND_BYTES","variable-value":"268435456"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_SNAPSHOT_COUNT","variable-value":"5000"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"flags/flag.go:113","message":"recognized and used environment variable","variable-name":"ETCD_TRUSTED_CA_FILE","variable-value":"/data/combinedca/cacertbundle.pem"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"etcdmain/etcd.go:73","message":"Running: ","args":["/usr/local/bin/etcd"]}
{"severity":"warning","timestamp":"2022-11-08T23:17:43.293Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"auth_successful","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:43.293Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"combinedca","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:43.293Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"etcd.liveness","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:43.293Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"fixtures","data-dir":"/data"}
{"severity":"warning","timestamp":"2022-11-08T23:17:43.293Z","caller":"etcdmain/etcd.go:446","message":"found invalid file under data directory","filename":"lost+found","data-dir":"/data"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"etcdmain/etcd.go:116","message":"server has been already initialized","data-dir":"/data","dir-type":"member"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"embed/etcd.go:124","message":"configuring peer listeners","listen-peer-urls":["https://0.0.0.0:2380"]}
{"severity":"warning","timestamp":1667949463.2936218,"caller":"fileutil/fileutil.go:57","message":"check file permission","error":"directory \"/data/fixtures/peer\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.293Z","caller":"embed/etcd.go:482","message":"starting with peer TLS","tls-info":"cert = /data/fixtures/peer/cert.pem, key = /data/fixtures/peer/key.pem, client-cert=/data/fixtures/peer/cert.pem, client-key=/data/fixtures/peer/key.pem, trusted-ca = , client-cert-auth = false, crl-file = ","cipher-suites":[]}
{"severity":"info","timestamp":"2022-11-08T23:17:43.309Z","caller":"embed/etcd.go:132","message":"configuring client listeners","listen-client-urls":["https://0.0.0.0:2379"]}
{"severity":"info","timestamp":"2022-11-08T23:17:43.309Z","caller":"embed/etcd.go:306","message":"starting an etcd server","etcd-version":"3.5.5","git-sha":"19002cfc6","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":true,"name":"etcd-0","data-dir":"/data","wal-dir":"","wal-dir-dedicated":"","member-dir":"/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":5000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://etcd-0.etcd-peer.idunossautoapp01.svc.cluster.local:2380"],"listen-peer-urls":["https://0.0.0.0:2380"],"advertise-client-urls":["https://etcd-0.etcd.idunossautoapp01:2379"],"listen-client-urls":["https://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":268435456,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"revision","auto-compaction-retention":"100ns","auto-compaction-interval":"100ns","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"severity":"warning","timestamp":1667949463.3107069,"caller":"fileutil/fileutil.go:57","message":"check file permission","error":"directory \"/data\" exist, but the permission is \"dgrwxrwxr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"severity":"warning","timestamp":1667949463.3372207,"caller":"fileutil/fileutil.go:57","message":"check file permission","error":"directory \"/data/member/snap\" exist, but the permission is \"dgrwxrwx---\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.337Z","caller":"etcdserver/backend.go:81","message":"opened backend db","path":"/data/member/snap/db","took":"431µs"}
{"severity":"info","timestamp":"2022-11-08T23:17:43.546Z","caller":"etcdserver/server.go:509","message":"recovered v2 store from snapshot","snapshot-index":5001,"snapshot-size":"26 kB"}
{"severity":"warning","timestamp":"2022-11-08T23:17:43.547Z","caller":"snap/db.go:88","message":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":5001,"snapshot-file-path":"/data/member/snap/0000000000001389.snap.db","error":"snap: snapshot file doesn't exist"}
{"severity":"critical","timestamp":"2022-11-08T23:17:43.547Z","caller":"etcdserver/server.go:516","message":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdserver/server.go:516\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/embed/etcd.go:243\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/usr/local/google/home/siarkowicz/.gvm/gos/go1.16.15/src/runtime/proc.go:225"}
panic: failed to recover v3 backend from snapshot

goroutine 1 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0002cc000, 0xc000536640, 0x1, 0x1)
	/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/go.uber.org/zap@v1.17.0/zapcore/entry.go:234 +0x58d
go.uber.org/zap.(*Logger).Panic(0xc00065a050, 0x124a82a, 0x2a, 0xc000536640, 0x1, 0x1)
	/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/go.uber.org/zap@v1.17.0/logger.go:227 +0x85
go.etcd.io/etcd/server/v3/etcdserver.NewServer(0xc00006a84a, 0x26, 0x0, 0x0, 0x0, 0x0, 0xc0001499e0, 0x1, 0x1, 0xc000149c20, ...)
	/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdserver/server.go:516 +0x1676
go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc0002c8700, 0xc00033a000, 0x0, 0x0)
	/tmp/etcd-release-3.5.5/etcd/release/etcd/server/embed/etcd.go:243 +0xef8
go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0xc0002c8700, 0x121e47b, 0x6, 0xc000533101, 0x2)
	/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:228 +0x32
go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2(0xc00003c1f0, 0x1, 0x1)
	/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:123 +0x24da
go.etcd.io/etcd/server/v3/etcdmain.Main(0xc00003c1f0, 0x1, 0x1)
	/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40 +0x13f
main.main()
	/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main.go:32 +0x45

serathius · 2022-11-14T13:45:33Z

This looks like #14569 and #14382.
However logs confirm that issue is still present in affecting v3.5.5 release.

IamSatyaonline · 2022-11-14T14:12:46Z

Hi @serathius

Thanks for your prompt reply.
Could you please help if any preventive fix or workaround is possible to fix this issue.

Thanks,
Satya

serathius · 2022-11-14T14:24:03Z

cc @ahrtr

ahrtr · 2022-11-14T18:05:36Z

@IamSatyaonline previously when you ran into #14569, what did you do before upgrading to 3.5.5? Did you resolve the issue before upgrading?

ahrtr · 2022-11-14T18:19:51Z

Based on the logs and timestamp, looks like the user ran into the same revision inconsistency issue which was resolved in #14733. The de-fragmentation operation was somehow terminated. @IamSatyaonline Please double confirm this and clarify why the de-fragmentation operation was terminated.

FYI. https://github.com/ahrtr/etcd-issues/tree/master/issues/revision_inconsistency

/usr/local/bin/scripts/entrypoint.sh: line 63: kill: (8) - No such process
2022-11-08T23:17:43.108+00:00 | info | Killing the ETCD process
bash: line 1:    33 Terminated              /usr/local/bin/etcdctl defrag --endpoints=:${CLIENT_PORTS} --insecure-skip-tls-verify
Restarting ETCD

{"severity":"warning","timestamp":"2022-11-08T23:17:43.547Z","caller":"snap/db.go:88","message":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":5001,"snapshot-file-path":"/data/member/snap/0000000000001389.snap.db","error":"snap: snapshot file doesn't exist"}
{"severity":"critical","timestamp":"2022-11-08T23:17:43.547Z","caller":"etcdserver/server.go:516","message":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdserver/server.go:516\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/embed/etcd.go:243\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/usr/local/google/home/siarkowicz/.gvm/gos/go1.16.15/src/runtime/proc.go:225"}
panic: failed to recover v3 backend from snapshot

ahrtr · 2022-11-14T20:26:49Z

@IamSatyaonline Please also do the following checks:

Go to ${ETCD_DATA_DIR}/membe/snap, and then execute ls -lrt. You can get the index from the name, for example, it's 253 (0xfd) in the last file.

$ ls -lrt
total 33064
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-000000000000009a.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000a5.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000b0.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000bb.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000c6.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:39 0000000000000002-00000000000000d1.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:39 0000000000000002-00000000000000dc.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:39 0000000000000002-00000000000000e7.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:40 0000000000000002-00000000000000f2.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:40 0000000000000002-00000000000000fd.snap
-rw-------  1 wachao  staff  16834560 Nov 10 20:40 db

Use etcd-dump-logs to analyze the WAL files. Follow example steps below,

# CD to directory etcd/tools/etcd-dump-logs
$ go build

# Please replace the directory with your real etcd data dir
$ ./etcd-dump-logs ~/tmp/etcd/infra1.etcd
Snapshot:
term=2 index=253 nodes=[8211f1d0f64f3269 91bc3c398fb3c146 fd422379fda50e48] confstate={"voters":[9372538179322589801,10501334649042878790,18249187646912138824],"auto_leave":false}
Start dumping log entries from snapshot.
WAL metadata:
nodeID=8211f1d0f64f3269 clusterID=ef37ad9dc622a7c4 term=2 commitIndex=261 vote=91bc3c398fb3c146
WAL entries: 9
lastIndex=262
term	     index	type	data
   2	       254	norm	header:<ID:3632580128977223734 > put:<key:"k1" value:"v247" > 
   2	       255	norm	header:<ID:3632580128977223735 > put:<key:"k1" value:"v248" > 
   2	       256	norm	header:<ID:3632580128977223736 > put:<key:"k1" value:"v249" > 
   2	       257	norm	header:<ID:3632580128977223737 > put:<key:"k1" value:"v250" > 
   2	       258	norm	header:<ID:3632580128977223738 > put:<key:"k1" value:"v251" > 
   2	       259	norm	header:<ID:3632580128977223739 > put:<key:"k1" value:"v252" > 
   2	       260	norm	header:<ID:3632580128977223740 > put:<key:"k1" value:"v253" > 
   2	       261	norm	header:<ID:3632580128977223741 > put:<key:"k1" value:"v254" > 
   2	       262	norm	header:<ID:3632580128977223742 > put:<key:"k1" value:"v255" >

IamSatyaonline · 2022-11-15T06:00:09Z

@IamSatyaonline previously when you ran into #14569, what did you do before upgrading to 3.5.5? Did you resolve the issue before upgrading?

Hi @ahrtr
This is the new deployment and we just installed the ETCD-3.5.5 and facing the issue.
We have not performed any upgrade so far on this deployment.

Thanks,
Satya

ahrtr · 2022-11-15T06:50:03Z

Could you please respond to my above two comments?

IamSatyaonline · 2022-11-15T08:56:51Z

Based on the logs and timestamp, looks like the user ran into the same revision inconsistency issue which was resolved in #14733. The de-fragmentation operation was somehow terminated. @IamSatyaonline Please double confirm this and clarify why the de-fragmentation operation was terminated.

FYI. https://github.com/ahrtr/etcd-issues/tree/master/issues/revision_inconsistency

/usr/local/bin/scripts/entrypoint.sh: line 63: kill: (8) - No such process
2022-11-08T23:17:43.108+00:00 | info | Killing the ETCD process
bash: line 1:    33 Terminated              /usr/local/bin/etcdctl defrag --endpoints=:${CLIENT_PORTS} --insecure-skip-tls-verify
Restarting ETCD

{"severity":"warning","timestamp":"2022-11-08T23:17:43.547Z","caller":"snap/db.go:88","message":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":5001,"snapshot-file-path":"/data/member/snap/0000000000001389.snap.db","error":"snap: snapshot file doesn't exist"}
{"severity":"critical","timestamp":"2022-11-08T23:17:43.547Z","caller":"etcdserver/server.go:516","message":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdserver/server.go:516\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/embed/etcd.go:243\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/usr/local/google/home/siarkowicz/.gvm/gos/go1.16.15/src/runtime/proc.go:225"}
panic: failed to recover v3 backend from snapshot

Based on the logs and timestamp, looks like the user ran into the same revision inconsistency issue which was resolved in #14733. The de-fragmentation operation was somehow terminated. @IamSatyaonline Please double confirm this and clarify why the de-fragmentation operation was terminated.

FYI. https://github.com/ahrtr/etcd-issues/tree/master/issues/revision_inconsistency

/usr/local/bin/scripts/entrypoint.sh: line 63: kill: (8) - No such process
2022-11-08T23:17:43.108+00:00 | info | Killing the ETCD process
bash: line 1:    33 Terminated              /usr/local/bin/etcdctl defrag --endpoints=:${CLIENT_PORTS} --insecure-skip-tls-verify
Restarting ETCD

{"severity":"warning","timestamp":"2022-11-08T23:17:43.547Z","caller":"snap/db.go:88","message":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":5001,"snapshot-file-path":"/data/member/snap/0000000000001389.snap.db","error":"snap: snapshot file doesn't exist"}
{"severity":"critical","timestamp":"2022-11-08T23:17:43.547Z","caller":"etcdserver/server.go:516","message":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdserver/server.go:516\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/embed/etcd.go:243\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/usr/local/google/home/siarkowicz/.gvm/gos/go1.16.15/src/runtime/proc.go:225"}
panic: failed to recover v3 backend from snapshot

Hi @ahrtr
ETCD running in TLS enabled mode in our cluster. We reloads the TLS certificates once we get the updated certificates.
During relaod of the certificates , We kill the etcd service and restart the ETCD service.
That is the reason , defragmentation is being terminated while killing the ETCD service to relaod the certificates.

It's difficult for us to run the etcd-dump-logs as our deployment is not up due to the panic issue.

Thanks,
Satya

IamSatyaonline · 2022-11-15T08:59:15Z

@IamSatyaonline Please also do the following checks:

Go to ${ETCD_DATA_DIR}/membe/snap, and then execute ls -lrt. You can get the index from the name, for example, it's 253 (0xfd) in the last file.

$ ls -lrt
total 33064
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-000000000000009a.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000a5.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000b0.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000bb.snap
-rw-r--r--  1 wachao  staff      9012 Nov 10 20:39 0000000000000002-00000000000000c6.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:39 0000000000000002-00000000000000d1.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:39 0000000000000002-00000000000000dc.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:39 0000000000000002-00000000000000e7.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:40 0000000000000002-00000000000000f2.snap
-rw-r--r--  1 wachao  staff      9360 Nov 10 20:40 0000000000000002-00000000000000fd.snap
-rw-------  1 wachao  staff  16834560 Nov 10 20:40 db

Use etcd-dump-logs to analyze the WAL files. Follow example steps below,

# CD to directory etcd/tools/etcd-dump-logs
$ go build

# Please replace the directory with your real etcd data dir
$ ./etcd-dump-logs ~/tmp/etcd/infra1.etcd
Snapshot:
term=2 index=253 nodes=[8211f1d0f64f3269 91bc3c398fb3c146 fd422379fda50e48] confstate={"voters":[9372538179322589801,10501334649042878790,18249187646912138824],"auto_leave":false}
Start dumping log entries from snapshot.
WAL metadata:
nodeID=8211f1d0f64f3269 clusterID=ef37ad9dc622a7c4 term=2 commitIndex=261 vote=91bc3c398fb3c146
WAL entries: 9
lastIndex=262
term	     index	type	data
   2	       254	norm	header:<ID:3632580128977223734 > put:<key:"k1" value:"v247" > 
   2	       255	norm	header:<ID:3632580128977223735 > put:<key:"k1" value:"v248" > 
   2	       256	norm	header:<ID:3632580128977223736 > put:<key:"k1" value:"v249" > 
   2	       257	norm	header:<ID:3632580128977223737 > put:<key:"k1" value:"v250" > 
   2	       258	norm	header:<ID:3632580128977223738 > put:<key:"k1" value:"v251" > 
   2	       259	norm	header:<ID:3632580128977223739 > put:<key:"k1" value:"v252" > 
   2	       260	norm	header:<ID:3632580128977223740 > put:<key:"k1" value:"v253" > 
   2	       261	norm	header:<ID:3632580128977223741 > put:<key:"k1" value:"v254" > 
   2	       262	norm	header:<ID:3632580128977223742 > put:<key:"k1" value:"v255" >

Hi @ahrtr

We can't run the etcd-dump-logs in our deployment As our deployment is not up due to the panic issue.

Thanks,
Satya

ahrtr · 2022-11-15T09:31:00Z

We can't run the etcd-dump-logs in our deployment As our deployment is not up due to the panic issue.

Executing etcd-dump-logs doesn't require the etcd instance running. Please executels -lrtunder the ${data_dir}/member/snap, and also analyze the WAL files using etcd-dump-logs per my comment above

IamSatyaonline · 2022-11-16T08:25:12Z

We can't run the etcd-dump-logs in our deployment As our deployment is not up due to the panic issue.

Executing etcd-dump-logs doesn't require the etcd instance running. Please executels -lrtunder the ${data_dir}/member/snap, and also analyze the WAL files using etcd-dump-logs per my comment above

Hi @ahrtr
This deployment is not available now and we are actively trying to reproduce this issue, so that we could follow the steps to analyze the WAL files.
@ahrtr
what is your view on this issue, is this the genuine issue in ETCD-3.5.5 ? could you please suggest a way to move forward to fix this panic issue ?

Thanks,
Satya

ahrtr · 2022-11-16T09:03:46Z

This deployment is not available now and we are actively trying to reproduce this issue, so that we could follow the steps to analyze the WAL files.

We do not need the deployment. Instead, we just need to data-directory of the etcd. Do you mean the data has already been removed? Or the storage supporting the dynamic volume consumed by the deployment has been removed?

what is your view on this issue, is this the genuine issue in ETCD-3.5.5 ?

Just as I mentioned in #14749 (comment), it's likely the same issue as #14733. But I need to double confirm, but unfortunately, you can't provide the info I requested.

we are actively trying to reproduce this issue

Is it easy to reproduce this issue?

IamSatyaonline · 2022-11-22T06:28:57Z

We do not need the deployment. Instead, we just need to data-directory of the etcd. Do you mean the data has already been > removed? Or the storage supporting the dynamic volume consumed by the deployment has been removed?

We have cleanup the storage and all the data and reinstalled the new chart. So we don't have the snap and WAL data.

Is it easy to reproduce this issue?

No , it's not easy to reproduce this issue.

ahrtr · 2022-11-22T06:33:22Z

Thanks @IamSatyaonline for the feedback. I'd close this issue for now. Please feel free to reopen it or create a new one if you reproduce this issue again. Please keep the environment once you reproduce it. thx

IamSatyaonline · 2022-11-22T06:43:47Z

Hi @ahrtr

Just as I mentioned in #14749 (comment), it's likely the same issue as #14733. But I need to double confirm, but unfortunately, you can't provide the info I requested.

As you mentioned it's the same issue as 14733 , which was fixed in 3.5.5 but We are able to reproduce this issue in ETCD-3.5.5, unfortunately we don't have WAL and snap data. Do you still think that there is no issue in ETCD-3.5.5 what we have reported ?
I have shared the ETCD logs, are the logs not enough to troubleshoot the issue ?

Thanks,
Satya

ahrtr · 2022-11-22T06:46:28Z

As you mentioned it's the same issue as 14733 , which was fixed in 3.5.5

The issue is fixed in 3.5.6.

rpstw · 2024-07-25T03:32:43Z

Turnout to be #14382 and fixed by using https://github.com/ahrtr/etcd-issues/blob/b221ffdee411e9dc1715d329f5e67f41366012b3/etcd/etcd-db-editor/main.go#L16-L28.

I must say it is a life-saver otherwise we had to recreate the entire Kubernetes cluster.

~~Hi @ahrtr~~

~~We ran into this issue after deleting a large number of pods.~~

~~We use Etcd 3.5.3, which is the default version included in Kubernetes 1.24.~~

As https://github.com/ahrtr/etcd-issues/tree/master/issues/revision_inconsistency#how-to-workaround-this-issue said a workaround is to rejoin. Unfortunately all of our master node crashed in the same way before we found this issue page.

~~Are there any other methods that could be used to recover the cluster?~~

~~Also since we are having a down cluster, checks in #14749 (comment) could all be executed and I can check anything else needed.~~

IamSatyaonline added the type/bug label Nov 14, 2022

serathius added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 14, 2022

serathius added the release/v3.5 label Nov 14, 2022

serathius mentioned this issue Nov 14, 2022

Release v3.5.6 #14750

Closed

22 tasks

ahrtr closed this as completed Nov 22, 2022

heresie mentioned this issue Mar 10, 2023

etcd dependency upgrade apache/apisix-helm-chart#506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD-3.5.5 : panic: failed to recover v3 backend from snapshot #14749

ETCD-3.5.5 : panic: failed to recover v3 backend from snapshot #14749

IamSatyaonline commented Nov 14, 2022

serathius commented Nov 14, 2022 •

edited

Loading

IamSatyaonline commented Nov 14, 2022

serathius commented Nov 14, 2022

ahrtr commented Nov 14, 2022

ahrtr commented Nov 14, 2022

ahrtr commented Nov 14, 2022

IamSatyaonline commented Nov 15, 2022

ahrtr commented Nov 15, 2022

IamSatyaonline commented Nov 15, 2022

IamSatyaonline commented Nov 15, 2022

ahrtr commented Nov 15, 2022

IamSatyaonline commented Nov 16, 2022

ahrtr commented Nov 16, 2022

IamSatyaonline commented Nov 22, 2022 •

edited

Loading

ahrtr commented Nov 22, 2022

IamSatyaonline commented Nov 22, 2022

ahrtr commented Nov 22, 2022

rpstw commented Jul 25, 2024 •

edited

Loading

ETCD-3.5.5 : panic: failed to recover v3 backend from snapshot #14749

ETCD-3.5.5 : panic: failed to recover v3 backend from snapshot #14749

Comments

IamSatyaonline commented Nov 14, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

serathius commented Nov 14, 2022 • edited Loading

IamSatyaonline commented Nov 14, 2022

serathius commented Nov 14, 2022

ahrtr commented Nov 14, 2022

ahrtr commented Nov 14, 2022

ahrtr commented Nov 14, 2022

IamSatyaonline commented Nov 15, 2022

ahrtr commented Nov 15, 2022

IamSatyaonline commented Nov 15, 2022

IamSatyaonline commented Nov 15, 2022

ahrtr commented Nov 15, 2022

IamSatyaonline commented Nov 16, 2022

ahrtr commented Nov 16, 2022

IamSatyaonline commented Nov 22, 2022 • edited Loading

ahrtr commented Nov 22, 2022

IamSatyaonline commented Nov 22, 2022

ahrtr commented Nov 22, 2022

rpstw commented Jul 25, 2024 • edited Loading

serathius commented Nov 14, 2022 •

edited

Loading

IamSatyaonline commented Nov 22, 2022 •

edited

Loading

rpstw commented Jul 25, 2024 •

edited

Loading