Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashicorp-consul-server-0 pod is dying with SIGSEGV #9600

Closed
aluchko opened this issue Jan 20, 2021 · 1 comment
Closed

hashicorp-consul-server-0 pod is dying with SIGSEGV #9600

aluchko opened this issue Jan 20, 2021 · 1 comment
Assignees
Labels

Comments

@aluchko
Copy link

aluchko commented Jan 20, 2021

I've encountered this issue a few times. I have consul deployed on my minikube cluster using the helm chart listed in the tutorial (though I also enabled syncCatalog).

Things seem to work well for a while, but then at some point, often after restarting minikube, the consul server will get into a crash loop

kubectl logs hashicorp-consul-server-0
==> Starting Consul agent...
           Version: '1.9.1'
           Node ID: '2b24045a-08bd-14cb-669c-4f6e4177a10d'
         Node name: 'hashicorp-consul-server-0'
        Datacenter: 'minidc' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 172.17.0.5 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2021-01-20T17:17:22.452Z [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
    2021-01-20T17:17:22.452Z [WARN]  agent: bootstrap = true: do not enable unless necessary
    2021-01-20T17:17:22.558Z [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
    2021-01-20T17:17:22.558Z [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
    2021-01-20T17:17:22.665Z [INFO]  agent.server.raft: restored from snapshot: id=8-16385-1611097981735
    2021-01-20T17:17:22.955Z [INFO]  agent.server.raft: initial configuration: index=18522 servers="[{Suffrage:Voter ID:2b24045a-08bd-14cb-669c-4f6e4177a10d Address:172.17.0.8:8300}]"
    2021-01-20T17:17:22.955Z [INFO]  agent.server.raft: entering follower state: follower="Node at 172.17.0.5:8300 [Follower]" leader=
    2021-01-20T17:17:22.956Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: hashicorp-consul-server-0.minidc 172.17.0.5
    2021-01-20T17:17:22.956Z [WARN]  agent.server.serf.wan: serf: Failed to re-join any previously known node
    2021-01-20T17:17:22.957Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: hashicorp-consul-server-0 172.17.0.5
    2021-01-20T17:17:22.957Z [INFO]  agent.router: Initializing LAN area manager
    2021-01-20T17:17:22.957Z [INFO]  agent.server.serf.lan: serf: Attempting re-join to previously known node: minikube: 172.17.0.4:8301
    2021-01-20T17:17:22.957Z [INFO]  agent.server: Handled event for server in area: event=member-join server=hashicorp-consul-server-0.minidc area=wan
    2021-01-20T17:17:22.957Z [INFO]  agent.server: Adding LAN server: server="hashicorp-consul-server-0 (Addr: tcp/172.17.0.5:8300) (DC: minidc)"
    2021-01-20T17:17:22.957Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
    2021-01-20T17:17:22.957Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2021-01-20T17:17:23.051Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: minikube 172.17.0.4
    2021-01-20T17:17:23.051Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http
    2021-01-20T17:17:23.150Z [WARN]  agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
    2021-01-20T17:17:23.052Z [INFO]  agent.server.serf.lan: serf: Re-joined to previously known node: minikube: 172.17.0.4:8301
    2021-01-20T17:17:23.150Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
    2021-01-20T17:17:23.250Z [INFO]  agent: Joining cluster...: cluster=LAN
    2021-01-20T17:17:23.250Z [INFO]  agent: (LAN) joining: lan_addresses=[hashicorp-consul-server-0.hashicorp-consul-server.default.svc:8301]
    2021-01-20T17:17:23.250Z [INFO]  agent: started state syncer
==> Consul agent running!
    2021-01-20T17:17:23.350Z [INFO]  agent: (LAN) joined: number_of_nodes=1
    2021-01-20T17:17:23.350Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
    2021-01-20T17:17:30.356Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2021-01-20T17:17:30.684Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
    2021-01-20T17:17:30.684Z [INFO]  agent.server.raft: entering candidate state: node="Node at 172.17.0.5:8300 [Candidate]" term=51
    2021-01-20T17:17:30.704Z [INFO]  agent.server.raft: election won: tally=1
    2021-01-20T17:17:30.704Z [INFO]  agent.server.raft: entering leader state: leader="Node at 172.17.0.5:8300 [Leader]"
    2021-01-20T17:17:30.704Z [INFO]  agent.server: cluster leadership acquired
    2021-01-20T17:17:30.705Z [INFO]  agent.server: New leader elected: payload=hashicorp-consul-server-0
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x7ad532]

goroutine 109 [running]:
github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc000f73820, 0xc000f72fa0, 0x0, 0x0, 0xc000908fd0, 0x0, 0xffffffffffffffff)
	/go/pkg/mod/github.com/hashicorp/go-immutable-radix@v1.3.0/iter.go:178 +0xb2
github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc000908d50, 0xc0006b4e20, 0x485b)
	/go/pkg/mod/github.com/hashicorp/go-memdb@v1.3.0/txn.go:895 +0x2e
github.com/hashicorp/consul/agent/consul/state.cleanupMeshTopology(0x38f5800, 0xc0006b4e20, 0x485b, 0xc000603b00, 0x485b, 0xc000603c70)
	/home/circleci/project/consul/agent/consul/state/catalog.go:3271 +0x36c
github.com/hashicorp/consul/agent/consul/state.(*Store).deleteServiceTxn(0xc00089e6c0, 0x38f5800, 0xc0006b4e20, 0x485b, 0xc000e75728, 0x8, 0xc00020c5a0, 0x49, 0xc000603c70, 0x0, ...)
	/home/circleci/project/consul/agent/consul/state/catalog.go:1542 +0x8c5
github.com/hashicorp/consul/agent/consul/state.(*Store).deleteNodeTxn(0xc00089e6c0, 0x38f5800, 0xc0006b4e20, 0x485b, 0xc000e75728, 0x8, 0xb25ddc, 0xc000f9b860)
	/home/circleci/project/consul/agent/consul/state/catalog.go:715 +0x62d
github.com/hashicorp/consul/agent/consul/state.(*Store).DeleteNode(0xc00089e6c0, 0x485b, 0xc000e75728, 0x8, 0x0, 0x0)
	/home/circleci/project/consul/agent/consul/state/catalog.go:648 +0xbb
github.com/hashicorp/consul/agent/consul/fsm.(*FSM).applyDeregister(0xc000c879e0, 0xc000979641, 0x3c, 0x3c, 0x485b, 0x0, 0x0)
	/home/circleci/project/consul/agent/consul/fsm/commands_oss.go:171 +0x41a
github.com/hashicorp/consul/agent/consul/fsm.NewFromDeps.func1(0xc000979641, 0x3c, 0x3c, 0x485b, 0xc0001376d0, 0xc000999c80)
	/home/circleci/project/consul/agent/consul/fsm/fsm.go:99 +0x56
github.com/hashicorp/consul/agent/consul/fsm.(*FSM).Apply(0xc000c879e0, 0xc000b98aa0, 0x0, 0x0)
	/home/circleci/project/consul/agent/consul/fsm/fsm.go:133 +0x1b6
github.com/hashicorp/go-raftchunking.(*ChunkingFSM).Apply(0xc000c8b740, 0xc000b98aa0, 0x5191aa0, 0xbffa374b43333586)
	/go/pkg/mod/github.com/hashicorp/go-raftchunking@v0.6.1/fsm.go:66 +0x5b
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc000963050)
	/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:90 +0x2c2
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc000b96000, 0x40, 0x40)
	/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc00025ec00)
	/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:219 +0x3c4
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc00025ec00, 0xc000da3c80)
	/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
	/go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:144 +0x66

The only way to fix this seems to be uninstalling consul via helm, deleting the persistent volume, and then reinstalling consul to the cluster.

$ helm uninstall hashicorp
$ kubectl delete -n default persistentvolumeclaim data-default-hashicorp-consul-server-0
persistentvolumeclaim "data-default-hashicorp-consul-server-0" deleted
$ kubectl delete persistentvolume pvc-066ace39-2807-4c53-8b1f-91cc6e9a5f51
persistentvolume "pvc-066ace39-2807-4c53-8b1f-91cc6e9a5f51" deleted
$ helm install hashicorp hashicorp/consul -f helm-consul-values.yaml 

I saved the problematic hostpath-provisioner directory but I'm not sure about uploading since I don't know what data is contained in it.

@ghost ghost added the crash label Jan 20, 2021
@dnephin
Copy link
Contributor

dnephin commented Jan 20, 2021

Thank you for the bug report! This looks like the same issue as #9566. We just finished this bug and a release will be going out soon with the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants