Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[db2] pod c-db2w-shared-etcd-0 goes into crash loop #1039

Closed
witekwww opened this issue Oct 9, 2023 · 2 comments
Closed

[db2] pod c-db2w-shared-etcd-0 goes into crash loop #1039

witekwww opened this issue Oct 9, 2023 · 2 comments
Labels
Triage Issue was triaged and acknowledged

Comments

@witekwww
Copy link
Contributor

witekwww commented Oct 9, 2023

DB2 deployment using oneclick_add_manage with DB2 installation in-cluster.
DB2 does deploy properly (and installation continues for Manage), but the c-db2w-shared-etcd-0 pod is in CrashLoopBackOff state.

logs of the pod:
`+ '[' -z 1 ']'

  • '[' -z c-db2w-shared-etcd ']'
  • ln -sf /persistence/etcd/c-db2w-shared-etcd-0 /var/run/etcd
  • [[ ! -z '' ]]
  • '[' -e /var/run/etcd/default.etcd ']'
  • SET_ID=0
  • '[' 0 -ge 1 ']'
  • PEERS=
    ++ seq 0 0
  • for i in $(seq 0 $((${INITIAL_CLUSTER_SIZE} - 1)))
  • PEERS=c-db2w-shared-etcd-0=http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2380
  • dnslookup_etcd
  • i=0
  • [[ 0 -le 59 ]]
  • nslookup c-db2w-shared-etcd-0.c-db2w-shared-etcd
    Server: 172.30.0.10
    Address: 172.30.0.10#53

Name: c-db2w-shared-etcd-0.c-db2w-shared-etcd.db2u.svc.cluster.local
Address: 10.130.0.29

  • [[ 0 -eq 0 ]]
  • break
  • collect_member
  • exec etcd --name c-db2w-shared-etcd-0 --initial-advertise-peer-urls http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2380 --listen-peer-urls http://0.0.0.0:2380 --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster c-db2w-shared-etcd-0=http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2380 --initial-cluster-state new --data-dir /var/run/etcd/default.etcd --enable-v2 --logger=zap
  • member_id_file=/var/run/etcd/member_id
  • [[ ! -f /var/run/etcd/member_id ]]
  • etcdctl member list
    {"level":"info","ts":"2023-10-09T07:20:45.208Z","caller":"embed/etcd.go:117","msg":"configuring peer listeners","listen-peer-urls":["http://0.0.0.0:2380"]}
    {"level":"info","ts":"2023-10-09T07:20:45.223Z","caller":"embed/etcd.go:127","msg":"configuring client listeners","listen-client-urls":["http://0.0.0.0:2379"]}
    {"level":"info","ts":"2023-10-09T07:20:45.223Z","caller":"embed/etcd.go:302","msg":"starting an etcd server","etcd-version":"3.4.14","git-sha":"8a03d2e96","go-version":"go1.12.17","go-os":"linux","go-arch":"amd64","max-cpu-set":12,"max-cpu-available":12,"member-initialized":false,"name":"c-db2w-shared-etcd-0","data-dir":"/var/run/etcd/default.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/run/etcd/default.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":[""],"host-whitelist":[""],"initial-cluster":"c-db2w-shared-etcd-0=http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster-1","quota-size-bytes":2147483648,"pre-vote":false,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":""}
    {"level":"info","ts":"2023-10-09T07:20:45.223Z","caller":"embed/etcd.go:363","msg":"closing etcd server","name":"c-db2w-shared-etcd-0","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2380"],"advertise-client-urls":["http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2379"]}
    {"level":"info","ts":"2023-10-09T07:20:45.223Z","caller":"embed/etcd.go:367","msg":"closed etcd server","name":"c-db2w-shared-etcd-0","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2380"],"advertise-client-urls":["http://c-db2w-shared-etcd-0.c-db2w-shared-etcd:2379"]}
    {"level":"warn","ts":"2023-10-09T07:20:45.223Z","caller":"etcdmain/etcd.go:176","msg":"failed to start etcd","error":"cannot access data directory: mkdir /var/run/etcd: file exists"}
    {"level":"fatal","ts":"2023-10-09T07:20:45.223Z","caller":"etcdmain/etcd.go:271","msg":"discovery failed","error":"cannot access data directory: mkdir /var/run/etcd: file exists","stacktrace":"go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/etcd.go:271\ngo.etcd.io/etcd/etcdmain.Main\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/etcdmain/main.go:46\nmain.main\n\t/tmp/etcd-release-3.4.14/etcd/release/etcd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:200"}
    `
@QDespeisseTalan
Copy link

Hi,

I ran into the same issue.
To fix it, I had to wait for db2u engine to start (pod *-db2u-0) to start and then delete the etcd-0 pod.
Then, it recreates it and it starts normally.

This is only a way to make the installation continue, I guess a patch should be developped

@andrercm andrercm added the Triage Issue was triaged and acknowledged label Oct 16, 2023
@andrercm
Copy link
Contributor

This is related to a known db2 issue: https://www.ibm.com/docs/cloud-paks/cp-data/4.6.x?topic=issues-watson-query#known-issues-dv__install-upgrade__title__1

Not a problem with the ansible-devops automation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage Issue was triaged and acknowledged
Projects
None yet
Development

No branches or pull requests

3 participants