-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Upgrades and Alloc.TaskResources modification #6922
Conversation
This commit ensures that Alloc.AllocatedResources is properly populated when read from persistence stores (namely Raft and client state store). The alloc struct may have been written previously by an arbitrary old version that may only populate Alloc.TaskResources.
Now that alloc.Canonicalize() is called in all alloc sources in the client (i.e. on state restore and RPC fetching), we no longer need to check alloc.TaskResources. alloc.AllocatedResources is always non-nil through alloc runner. Though, early on, we check for alloc validity, so NewTaskRunner and TaskEnv must still check. `TestClient_AddAllocError` test validates that behavior.
@@ -603,7 +603,6 @@ func (b *Builder) setAlloc(alloc *structs.Allocation) *Builder { | |||
|
|||
tg := alloc.Job.LookupTaskGroup(alloc.TaskGroup) | |||
|
|||
// COMPAT(0.11): Remove in 0.11 | |||
b.otherPorts = make(map[string]string, len(tg.Tasks)*2) | |||
if alloc.AllocatedResources != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check anymore? Seems like if AllocatedResources is somehow nil
here we'll end up with an incomplete task environment which would be harder to detect in testing than a panic.
Sadly we don't have a logger or error return value here, so our options are limited. If you think risking an incomplete environment is better than risking a panic, let's just add that as a comment here as the code will look kind of strange down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this conditional because TestClient_AddAllocError
asserts that when an invalid alloc is passed, taskenv doesn't panic and NewTaskRunner returns an error. Not sure what the conditions the test actually tests for.
@@ -7527,15 +7527,17 @@ type Allocation struct { | |||
// the scheduler. | |||
Resources *Resources | |||
|
|||
// COMPAT(0.11): Remove in 0.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this be COMPAT(0.12): Remove in 0.12
? If both clients and servers upgrade-allocs-on-restore in 0.11, what could still have AllocatedResources==nil
in 0.12 (and therefore need these fields to populate AllocRes)? It seems like the only problem would be if someone used a 0.10 or earlier agents with 0.12 code.
(Not that we have to remove it in 0.12, I'm just curious if we could while maintaining our +/-1 Y version safety.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If both clients and servers upgrade-allocs-on-restore in 0.11, what could still have
AllocatedResources==nil
in 0.12 (and therefore need these fields to populate AllocRes)?
If both clients and servers upgrade the on-disk representation, then yes. But we currently don't do that, neither will with this PR. Consider the case where a cluster starts with nomad 0.8; then operator upgrades in rapid short successions through 0.9, 0.10 (with this PR), 0.11, and then to 0.12 - so fast such that Raft didn't generate a Snapshot during these upgrades. In this case, Nomad 0.12 will read the representation that was persisted by 0.8 and lacks AllocatedResources
.
To be able to fully remove it, we must augment the recommended upgrade path to ensure on-disk representation get upgraded before a user can do a subsequent upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we Raft Snapshot on server agent startup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider it - we'll need to do some vetting and testing before going there. Agent startup is critical to cluster recovery and I'd be nervous about adding a blocking call that may fail there; if done asynchronously, we'd need to properly indicate to operators when it's safe to safe to upgrade and potentially cope with operators potentially ignore the warning. Maybe consider it as part of 0.12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, a blocking nomad operator ...
command might be sufficient + upgrade instructions to use it if "rapidly" upgrading from 0.N to 0.N+2.
Definitely seems best to leave it out of scope for this effort, but could you file an issue describing how its currently impossible to safely remove deprecated fields that are persisted to raft? Doesn't seem like anything we need to rush to fix, but I can see it mattering a lot more post-1.0 when people are much slower to upgrade (Consul struggles with this), and may want options to upgrade from 1.N to 1.N+x (where x > 1
) quickly and easily.
nomad/structs/structs.go
Outdated
|
||
// TODO: Investigate if we should canonicalize the job | ||
// it may be out of sync with respect to the original job | ||
// a.Job.Canonicalize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// a.Job.Canonicalize() | |
a.Job.Canonicalize() |
I think this is the right thing to do:
- On clients they may have an old version of a job persisted that may not match the expectations of the code running in the agent.
- On servers I actually don't remember off the top of my head if Jobs are persisted on Allocs (are allocs normalized only in raft messages on the wire, or in state store too 🤔 ?). Luckily this doesn't matter! Job.Canonicalize handles job being
nil
and Jobs are already canonicalized on statestore restore.
So I think we might waste a couple CPU instructions, but it seems necessary on clients at least.
nomad/structs/structs.go
Outdated
// a.Job.Canonicalize() | ||
} | ||
|
||
func toAllocatedResources(taskResources map[string]*Resources) map[string]*AllocatedTaskResources { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to just inline this code into Canonicalize as it seems easy to get orphaned here if we are able to remove the canonicalization in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schmichael beat me to finishing the review so I think he's addressed most of my specific questions/comments already. A more general question: calling Canonicalize
in the appropriate places is a new invariant which could lead to panics if we miss it. Is there any way (other than "be careful") that we could guarantee we can't miss it at build/test time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general question: calling Canonicalize in the appropriate places is a new invariant which could lead to panics if we miss it. Is there any way (other than "be careful") that we could guarantee we can't miss it at build/test time?
To clarify, calling Canonicalize isn't a new invariant - it's the pattern we already used for Job and Node migrations before. As you point out, it's fragile and we miss the subtlety, so generic "be careful" advice isn't so great.
My suggestion would be a combination of codifying the patterns related to persisted structure compatibility (in a checklist or a engineering guide) and follow up later with more e2e tests that exercise upgrade paths and test multiple combinations of upgrade. Our current e2e framework is too weak to accommodate tests like that, so we'd need more investments there.
alloc.Job may be stale as well and need to migrate it. It does cost extra cycles but should be negligible.
There is a case for always canonicalizing alloc.Job field when canonicalizing the alloc. I'm less certain of implications though, and the job canonicalize hasn't changed for a long time. Here, we special case client restore from database as it's probably the most relevant part. When receiving an alloc from RPC, the data should be fresh enough.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
In 0.9, Allocation resources tracking changed, such that
Allocation.{TaskResources|Resources|SharedResources}
was effectively moved toAllocation.AllocationedResources
and we initially targetted 0.11 to drop checking old fields.This strategy is not correct. Because the Allocation is a persisted struct in Raft and Client Store and persisted store may remain untouched despite many nomad upgrades (e.g. if alloc doesn't change), we cannot properly stop relying on the old fields. Removing the paths would cause a panic.
This change ensures that Allocation modification follows the pattern of Job and Node schema change. Namely, that we introduce a Canonicalize method that applies a struct schema migration on reads from a persisted store (Raft log, Raft snapshot, Client state store). This eases compatibility handling, as other components can start relying on
Allocation.AllocatedResources
immediately and no longer need to fallback toAllocation.TaskResources
anymore. We used this Canonicalize pattern inJob.Namespace
andNode.SchedulingEligibility
introduction, but we missed it here.It's worth noting that we must preserve
Allocation.TaskResources
effectively forever. Consider an installation with Nomad 0.1 (pre namespaces, etc) running a single alloc, and then upgraded sequentially through all intermediate versions until 0.11 (or a future version) in a short time. We cannot assure that the alloc or job was updated or that a Raft Snapshot was created with the latest schema. As such, we should consul the fallback paths and perform schema migration indefinitely. We may revisit this if we changed upgrade requirements in a way that enforces making a Snapshot with latest schema.Though
Allocation.TaskResources
struct must be preserved,structs.Resources
methods are deadcode now and can be removed. We only need to preserve the struct type and fields to allow migration.To reproduce the issue
nomad alloc status -json <alloc_id>
and note that"AllocatedResources"
is nil, though COMPAT comments suggests otherwiseCaveats
We don't anticipate schema migrations to add much overhead here. Canonicalization of already up-to-date allocation structs will be effectively a nil check.
It's unfortunate that
Canonicalize
in the structs package refer to upgrade path schema migration handling.Canonicalize
methods in theapi/
package to refer populating struct with defaults. This overload is a bit confusing. I opted to keep current pattern and we may consider renaming struct package method name toUpgradeSchema
in follow up PR.