Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider all system jobs for a new node #11054

Merged
merged 4 commits into from
Aug 18, 2021
Merged

Consider all system jobs for a new node #11054

merged 4 commits into from
Aug 18, 2021

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Aug 13, 2021

When a node becomes ready, create an eval for all system jobs across
namespaces.

The previous code uses job.ID to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.

I've added a failing test in the first comment to highlight the issue: https://app.circleci.com/pipelines/github/hashicorp/nomad/17325/workflows/0a3cbdb9-8433-403d-b226-97eb0f9218c5/jobs/170084 that passes with the fix.

Mahmood Ali added 2 commits August 13, 2021 10:40
When a node becomes ready, create an eval for all system jobs across
namespaces.

The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
@@ -1418,10 +1418,10 @@ func (n *Node) createNodeEvals(nodeID string, nodeIndex uint64) ([]string, uint6
// Create an evaluation for each system job.
for _, job := range sysJobs {
// Still dedup on JobID as the node may already have the system job.
if _, ok := jobIDs[job.ID]; ok {
if _, ok := jobIDs[*job.NamespacedID()]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope: Should we drop the pointer from Job.NamespacedID() *NamespacedID so it matches the behavior of structs.NewNamespacedID(...) and Allocation.JobNamespacedID()?

Derefs without nil checks just always scare me even if they're "known" to be safe through manual code analysis.

Mahmood Ali added 2 commits August 18, 2021 09:17
NamespacedID is never nil and is very light (two string fields) that
isn't in the hot path. So it's safe to return and use the struct
directly.

Avoiding using pointers here avoids the risk of dereferencing a nil
pointer.
@vercel vercel bot temporarily deployed to Preview – nomad August 18, 2021 13:30 Inactive
@notnoop notnoop merged commit 327d461 into main Aug 18, 2021
@notnoop notnoop deleted the b-multiple-ns-sys-jobs branch August 18, 2021 13:50
notnoop pushed a commit that referenced this pull request Aug 26, 2021
When a node becomes ready, create an eval for all system jobs across
namespaces.

The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
notnoop pushed a commit that referenced this pull request Aug 26, 2021
When a node becomes ready, create an eval for all system jobs across
namespaces.

The previous code uses `job.ID` to deduplicate evals, but that ignores
the job namespace. Thus if there are multiple jobs in different
namespaces sharing the same ID/Name, only one will be considered for
running in the new node. Thus, Nomad may skip running some system jobs
in that node.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants