Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent crash on Consul check sync failures #24512

Closed
lmorel3 opened this issue Nov 20, 2024 · 3 comments · Fixed by #24513
Closed

agent crash on Consul check sync failures #24512

lmorel3 opened this issue Nov 20, 2024 · 3 comments · Fixed by #24513

Comments

@lmorel3
Copy link

lmorel3 commented Nov 20, 2024

Nomad version

1.9.3

Operating system and Environment details

image

Issue

I've upgraded from Nomad 1.4.3 to 1.9.3 and everything works fine... except that sometimes the server (not clients) crash and I need to restart it (started via a windows service).

Can't understand why.

Nomad Server logs (if appropriate)

==> WARNING: mTLS is not configured - Nomad is not secure without mTLS!
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Failed to check for updates: Get "https://checkpoint-api.hashicorp.com/v1/check/nomad?arch=amd64&os=windows&signature=1f63914d-5e2a-c019-ae76-2330089cda17&version=1.9.3": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x8 pc=0x2642f7b]

goroutine 10 [running]:
github.com/hashicorp/nomad/command/agent/consul.(*ServiceClient).sync(0xc0006ebd40, 0x0)
	github.com/hashicorp/nomad/command/agent/consul/service_client.go:1108 +0xf1b
github.com/hashicorp/nomad/command/agent/consul.(*ServiceClient).Run(0xc0006ebd40)
	github.com/hashicorp/nomad/command/agent/consul/service_client.go:891 +0x407
created by github.com/hashicorp/nomad/command/agent/consul.(*ServiceClientWrapper).Run in goroutine 1
	github.com/hashicorp/nomad/command/agent/consul/service_client.go:488 +0xb5
@tgross
Copy link
Member

tgross commented Nov 20, 2024

Hi @lmorel3. This was pretty quick to debug and I've got a fix up here: #24513

@tgross tgross self-assigned this Nov 20, 2024
@tgross tgross added this to the 1.9.x milestone Nov 20, 2024
tgross added a commit that referenced this issue Nov 20, 2024
When the service client syncs to Consul, we accumulate service sync errors in a
multierror before reading all the local checks. If the API call to the local
checks fails, we either return that error or append it to the multierror and
return the set of errors. But `multierror.Error.Len()` doesn't nil-check, so we
need to do this ourselves.

I've also made a quick pass through the rest of the code base looking for
multierror `Len` method calls to see if we have this pattern elsewhere.

Fixes: #24512
@lmorel3
Copy link
Author

lmorel3 commented Nov 20, 2024

Wow, dude, you're so quick! Thank you :)

Good job.

@lmorel3 lmorel3 closed this as completed Nov 20, 2024
tgross added a commit that referenced this issue Nov 20, 2024
When the service client syncs to Consul, we accumulate service sync errors in a
multierror before reading all the local checks. If the API call to the local
checks fails, we either return that error or append it to the multierror and
return the set of errors. But `multierror.Error.Len()` doesn't nil-check, so we
need to do this ourselves.

I've also made a quick pass through the rest of the code base looking for
multierror `Len` method calls to see if we have this pattern elsewhere.

Fixes: #24512
@tgross tgross changed the title Nomad 1.9.+ : crash agent crash on Consul check sync failures Nov 20, 2024
@lmorel3
Copy link
Author

lmorel3 commented Dec 2, 2024

@tgross have you planned to release soon? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants