Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Fix nil pointer dereference if alloc has nil Job into release/1.5.x #19980

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #19972 to be assessed for backporting due to the inclusion of the label backport/1.5.x.

The below text is copied from the body of the original PR.


We encountered the following on one of our production hosts:

Nomad v1.6.1
BuildDate 2023-07-21T13:49:42Z
Revision 515895c7690cdc72278018dc5dc58aca41204ccc
Feb 13 22:11:31 s143 nomad-client[52792]: panic: runtime error: invalid memory address or nil pointer dereference
Feb 13 22:11:31 s143 nomad-client[52792]: [signal SIGSEGV: segmentation violation code=0x1 addr=0xe8 pc=0x1c3a9be]
Feb 13 22:11:31 s143 nomad-client[52792]: goroutine 1 [running]:
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/nomad/structs.(*Job).LookupTaskGroup(...)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/nomad/structs/structs.go:4805
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/client.(*Client).hasLocalState(0xc000004c00, 0xc001000200)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/client/client.go:1309 +0x3e
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/client.(*Client).restoreState(0xc000004c00)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/client/client.go:1202 +0x25e
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/client.NewClient(0xc000251b80, {0x3536a48?, 0xc0006aa020}, {0x352c420?, 0xc000274a50}, {0x354b660?, 0xc00084ca50}, 0xc?)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/client/client.go:560 +0x21be
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/command/agent.(*Agent).setupClient(0xc000328360)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/command/agent/agent.go:1082 +0x2e5
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/command/agent.NewAgent(0xc001000800, {0x356fa48?, 0xc00061a1e0}, {0x3531800?, 0xc00100c1f8}, 0xc001070ff0)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/command/agent/agent.go:152 +0x208
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/command/agent.(*Command).setupAgent(0xc000ef8c00, 0xc001000800, {0x356fa48, 0xc00061a1e0}, {0x3531800, 0xc00100c1f8}, 0x0?)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/command/agent/command.go:568 +0xaa
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/hashicorp/nomad/command/agent.(*Command).Run(0xc000ef8c00, {0xc0001a61a0, 0x4, 0x4})
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/command/agent/command.go:774 +0x631
Feb 13 22:11:31 s143 nomad-client[52792]: github.com/mitchellh/cli.(*CLI).Run(0xc000e67e00)
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/mitchellh/cli@v1.1.5/cli.go:262 +0x5f8
Feb 13 22:11:31 s143 nomad-client[52792]: main.Run({0xc0001a6190, 0x5, 0x5})
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/main.go:110 +0x28a
Feb 13 22:11:31 s143 nomad-client[52792]: main.main()
Feb 13 22:11:31 s143 nomad-client[52792]: #011github.com/hashicorp/nomad/main.go:80 +0x4e

We were able to resolve the issue by deleting state.db and state.db.backup on that host.

I believe there must have been some corrupt state stored in the DB that somehow decoded to an alloc with a nil Job.


Overview of commits

@tgross tgross merged commit 9022b71 into release/1.5.x Feb 14, 2024
23 of 26 checks passed
@tgross tgross deleted the backport/main/deeply-choice-lamprey branch February 14, 2024 16:27
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants