-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider all system jobs for a new node #11054
Conversation
When a node becomes ready, create an eval for all system jobs across namespaces. The previous code uses `job.ID` to deduplicate evals, but that ignores the job namespace. Thus if there are multiple jobs in different namespaces sharing the same ID/Name, only one will be considered for running in the new node. Thus, Nomad may skip running some system jobs in that node.
nomad/node_endpoint.go
Outdated
@@ -1418,10 +1418,10 @@ func (n *Node) createNodeEvals(nodeID string, nodeIndex uint64) ([]string, uint6 | |||
// Create an evaluation for each system job. | |||
for _, job := range sysJobs { | |||
// Still dedup on JobID as the node may already have the system job. | |||
if _, ok := jobIDs[job.ID]; ok { | |||
if _, ok := jobIDs[*job.NamespacedID()]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope: Should we drop the pointer from Job.NamespacedID() *NamespacedID
so it matches the behavior of structs.NewNamespacedID(...)
and Allocation.JobNamespacedID()
?
Derefs without nil checks just always scare me even if they're "known" to be safe through manual code analysis.
NamespacedID is never nil and is very light (two string fields) that isn't in the hot path. So it's safe to return and use the struct directly. Avoiding using pointers here avoids the risk of dereferencing a nil pointer.
When a node becomes ready, create an eval for all system jobs across namespaces. The previous code uses `job.ID` to deduplicate evals, but that ignores the job namespace. Thus if there are multiple jobs in different namespaces sharing the same ID/Name, only one will be considered for running in the new node. Thus, Nomad may skip running some system jobs in that node.
When a node becomes ready, create an eval for all system jobs across namespaces. The previous code uses `job.ID` to deduplicate evals, but that ignores the job namespace. Thus if there are multiple jobs in different namespaces sharing the same ID/Name, only one will be considered for running in the new node. Thus, Nomad may skip running some system jobs in that node.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
When a node becomes ready, create an eval for all system jobs across
namespaces.
The previous code uses
job.ID
to deduplicate evals, but that ignoresthe job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
I've added a failing test in the first comment to highlight the issue: https://app.circleci.com/pipelines/github/hashicorp/nomad/17325/workflows/0a3cbdb9-8433-403d-b226-97eb0f9218c5/jobs/170084 that passes with the fix.