You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered this issue a few times. I have consul deployed on my minikube cluster using the helm chart listed in the tutorial (though I also enabled syncCatalog).
Things seem to work well for a while, but then at some point, often after restarting minikube, the consul server will get into a crash loop
kubectl logs hashicorp-consul-server-0
==> Starting Consul agent...
Version: '1.9.1'
Node ID: '2b24045a-08bd-14cb-669c-4f6e4177a10d'
Node name: 'hashicorp-consul-server-0'
Datacenter: 'minidc' (Segment: '<all>')
Server: true (Bootstrap: true)
Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 172.17.0.5 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2021-01-20T17:17:22.452Z [WARN] agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2021-01-20T17:17:22.452Z [WARN] agent: bootstrap = true: do not enable unless necessary
2021-01-20T17:17:22.558Z [WARN] agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
2021-01-20T17:17:22.558Z [WARN] agent.auto_config: bootstrap = true: do not enable unless necessary
2021-01-20T17:17:22.665Z [INFO] agent.server.raft: restored from snapshot: id=8-16385-1611097981735
2021-01-20T17:17:22.955Z [INFO] agent.server.raft: initial configuration: index=18522 servers="[{Suffrage:Voter ID:2b24045a-08bd-14cb-669c-4f6e4177a10d Address:172.17.0.8:8300}]"
2021-01-20T17:17:22.955Z [INFO] agent.server.raft: entering follower state: follower="Node at 172.17.0.5:8300 [Follower]" leader=
2021-01-20T17:17:22.956Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: hashicorp-consul-server-0.minidc 172.17.0.5
2021-01-20T17:17:22.956Z [WARN] agent.server.serf.wan: serf: Failed to re-join any previously known node
2021-01-20T17:17:22.957Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: hashicorp-consul-server-0 172.17.0.5
2021-01-20T17:17:22.957Z [INFO] agent.router: Initializing LAN area manager
2021-01-20T17:17:22.957Z [INFO] agent.server.serf.lan: serf: Attempting re-join to previously known node: minikube: 172.17.0.4:8301
2021-01-20T17:17:22.957Z [INFO] agent.server: Handled event for server in area: event=member-join server=hashicorp-consul-server-0.minidc area=wan
2021-01-20T17:17:22.957Z [INFO] agent.server: Adding LAN server: server="hashicorp-consul-server-0 (Addr: tcp/172.17.0.5:8300) (DC: minidc)"
2021-01-20T17:17:22.957Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp
2021-01-20T17:17:22.957Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2021-01-20T17:17:23.051Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: minikube 172.17.0.4
2021-01-20T17:17:23.051Z [INFO] agent: Starting server: address=[::]:8500 network=tcp protocol=http
2021-01-20T17:17:23.150Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
2021-01-20T17:17:23.052Z [INFO] agent.server.serf.lan: serf: Re-joined to previously known node: minikube: 172.17.0.4:8301
2021-01-20T17:17:23.150Z [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2021-01-20T17:17:23.250Z [INFO] agent: Joining cluster...: cluster=LAN
2021-01-20T17:17:23.250Z [INFO] agent: (LAN) joining: lan_addresses=[hashicorp-consul-server-0.hashicorp-consul-server.default.svc:8301]
2021-01-20T17:17:23.250Z [INFO] agent: started state syncer
==> Consul agent running!
2021-01-20T17:17:23.350Z [INFO] agent: (LAN) joined: number_of_nodes=1
2021-01-20T17:17:23.350Z [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2021-01-20T17:17:30.356Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2021-01-20T17:17:30.684Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader=
2021-01-20T17:17:30.684Z [INFO] agent.server.raft: entering candidate state: node="Node at 172.17.0.5:8300 [Candidate]" term=51
2021-01-20T17:17:30.704Z [INFO] agent.server.raft: election won: tally=1
2021-01-20T17:17:30.704Z [INFO] agent.server.raft: entering leader state: leader="Node at 172.17.0.5:8300 [Leader]"
2021-01-20T17:17:30.704Z [INFO] agent.server: cluster leadership acquired
2021-01-20T17:17:30.705Z [INFO] agent.server: New leader elected: payload=hashicorp-consul-server-0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x7ad532]
goroutine 109 [running]:
github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc000f73820, 0xc000f72fa0, 0x0, 0x0, 0xc000908fd0, 0x0, 0xffffffffffffffff)
/go/pkg/mod/github.com/hashicorp/go-immutable-radix@v1.3.0/iter.go:178 +0xb2
github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc000908d50, 0xc0006b4e20, 0x485b)
/go/pkg/mod/github.com/hashicorp/go-memdb@v1.3.0/txn.go:895 +0x2e
github.com/hashicorp/consul/agent/consul/state.cleanupMeshTopology(0x38f5800, 0xc0006b4e20, 0x485b, 0xc000603b00, 0x485b, 0xc000603c70)
/home/circleci/project/consul/agent/consul/state/catalog.go:3271 +0x36c
github.com/hashicorp/consul/agent/consul/state.(*Store).deleteServiceTxn(0xc00089e6c0, 0x38f5800, 0xc0006b4e20, 0x485b, 0xc000e75728, 0x8, 0xc00020c5a0, 0x49, 0xc000603c70, 0x0, ...)
/home/circleci/project/consul/agent/consul/state/catalog.go:1542 +0x8c5
github.com/hashicorp/consul/agent/consul/state.(*Store).deleteNodeTxn(0xc00089e6c0, 0x38f5800, 0xc0006b4e20, 0x485b, 0xc000e75728, 0x8, 0xb25ddc, 0xc000f9b860)
/home/circleci/project/consul/agent/consul/state/catalog.go:715 +0x62d
github.com/hashicorp/consul/agent/consul/state.(*Store).DeleteNode(0xc00089e6c0, 0x485b, 0xc000e75728, 0x8, 0x0, 0x0)
/home/circleci/project/consul/agent/consul/state/catalog.go:648 +0xbb
github.com/hashicorp/consul/agent/consul/fsm.(*FSM).applyDeregister(0xc000c879e0, 0xc000979641, 0x3c, 0x3c, 0x485b, 0x0, 0x0)
/home/circleci/project/consul/agent/consul/fsm/commands_oss.go:171 +0x41a
github.com/hashicorp/consul/agent/consul/fsm.NewFromDeps.func1(0xc000979641, 0x3c, 0x3c, 0x485b, 0xc0001376d0, 0xc000999c80)
/home/circleci/project/consul/agent/consul/fsm/fsm.go:99 +0x56
github.com/hashicorp/consul/agent/consul/fsm.(*FSM).Apply(0xc000c879e0, 0xc000b98aa0, 0x0, 0x0)
/home/circleci/project/consul/agent/consul/fsm/fsm.go:133 +0x1b6
github.com/hashicorp/go-raftchunking.(*ChunkingFSM).Apply(0xc000c8b740, 0xc000b98aa0, 0x5191aa0, 0xbffa374b43333586)
/go/pkg/mod/github.com/hashicorp/go-raftchunking@v0.6.1/fsm.go:66 +0x5b
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc000963050)
/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:90 +0x2c2
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc000b96000, 0x40, 0x40)
/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc00025ec00)
/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:219 +0x3c4
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc00025ec00, 0xc000da3c80)
/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:144 +0x66
The only way to fix this seems to be uninstalling consul via helm, deleting the persistent volume, and then reinstalling consul to the cluster.
I've encountered this issue a few times. I have consul deployed on my minikube cluster using the helm chart listed in the tutorial (though I also enabled syncCatalog).
Things seem to work well for a while, but then at some point, often after restarting minikube, the consul server will get into a crash loop
The only way to fix this seems to be uninstalling consul via helm, deleting the persistent volume, and then reinstalling consul to the cluster.
I saved the problematic hostpath-provisioner directory but I'm not sure about uploading since I don't know what data is contained in it.
The text was updated successfully, but these errors were encountered: