Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prioritized client updates #17354

Merged
merged 3 commits into from
May 31, 2023
Merged

prioritized client updates #17354

merged 3 commits into from
May 31, 2023

Conversation

tgross
Copy link
Member

@tgross tgross commented May 30, 2023

The allocrunner sends several updates to the server during the early lifecycle of an allocation and its tasks. Clients batch-up allocation updates every 200ms, but experiments like the C2M challenge have shown that even with this batching, servers can be overwhelmed with client updates during high volume deployments. Benchmarking done in #9451 has shown that client updates can easily represent ~70% of all Nomad Raft traffic.

Each allocation sends many updates during its lifetime, but only those that change the ClientStatus field are critical for progressing a deployment or kicking off a reschedule to recover from failures.

Add a priority to the client allocation sync and update the syncTicker receiver so that we only send an update if there's a high priority update waiting, or on every 5th tick. This means when there are no high priority updates, the client will send updates at most every 1s instead of 200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as well as reduce client-to-server RPC traffic.

This changeset also switches from a channel-based collection of updates to a shared buffer, so as to split batching from sending and prevent backpressure onto the allocrunner when the RPC is slow. This doesn't have a major performance benefit in the benchmarks but makes the implementation of the prioritized update simpler.

Fixes: #9451

@tgross
Copy link
Member Author

tgross commented May 30, 2023

Moving this to draft because as I was re-reading it I realized there's another minor optimization we can make here that should make the implementation simpler in the allocrunner as well. Done.

The allocrunner sends several updates to the server during the early lifecycle
of an allocation and its tasks. Clients batch-up allocation updates every 200ms,
but experiments like the C2M challenge has shown that even with this batching,
servers can be overwhelmed with client updates during high volume
deployments. Benchmarking done in #9451 has shown that client updates can easily
represent ~70% of all Nomad Raft traffic.

Each allocation sends many updates during its lifetime, but only those that
change the `ClientStatus` field are critical for progressing a deployment or
kicking off a reschedule to recover from failures.

Add a priority to the client allocation sync and update the `syncTicker`
receiver so that we only send an update if there's a high priority update
waiting, or on every 5th tick. This means when there are no high priority
updates, the client will send updates at most every 1s instead of
200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as
well as reduce client-to-server RPC traffic.

This changeset also switches from a channel-based collection of updates to a
shared buffer, so as to split batching from sending and prevent backpressure
onto the allocrunner when the RPC is slow. This doesn't have a major performance
benefit in the benchmarks but makes the implementation of the prioritized update
simpler.

Fixes: #9451
@@ -765,6 +765,24 @@ func TestClient_SaveRestoreState(t *testing.T) {
return fmt.Errorf("expected running client status, got %v",
ar.AllocState().ClientStatus)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this test was actually already racy but it's a very tight race before the changes in this PR. So these fixes for the test remove the race.

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I added a couple of very minor comments and a question, but nothing blocking.

client/client.go Outdated
Comment on lines 2227 to 2228
// filteredAcknowledgedUpdates returns a list of client alloc updates with the
// already-acknowledged updates removed, and the highest priority of any update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the caller must hold at least a read lock on c.allocUpdatesLock when calling this function? If so, would it be worth adding a note to the function comment about this for future readers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go a step further and suggest extracting the allocUpdates map into its own little data structure and let it handle its own locking. There's quite a bit of implementation detail around what it's used for splattered all over these Client functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the caller must hold at least a read lock on c.allocUpdatesLock when calling this function? If so, would it be worth adding a note to the function comment about this for future readers?

Well it's not a RWMutex, but yes. Really the only reason this is in its own function at all now is that we need to take the c.allocLock to read from the allocrunners and having it in its own function lets us scope-down that lock.

I'd go a step further and suggest extracting the allocUpdates map into its own little data structure and let it handle its own locking. There's quite a bit of implementation detail around what it's used for splattered all over these Client functions.

So in other words, make updatesToSync and filterAcknowledgedUpdates methods on the allocUpdates data structure? We'd need to pass the *Client as a parameter so that we can access the allocrunners. But yeah that seems reasonable. Let me take a quick try at that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoenig @jrasell I've refactored this by pulling out the allocUpdates to their own struct pendingAllocUpdates (so as not to conflict with the other struct named allocUpdates which is the updates received from the server!). I think this makes the whole locking situation a lot cleaner. Let me know what you think.

client/client.go Outdated Show resolved Hide resolved
client/client.go Outdated Show resolved Hide resolved
client/client.go Outdated Show resolved Hide resolved
client/client.go Outdated Show resolved Hide resolved
Copy link
Member

@shoenig shoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; just the one locking thing

Comment on lines 1441 to +1442
case !last.DeploymentStatus.Equal(a.DeploymentStatus):
return false
return cstructs.AllocUpdatePriorityTypical
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically deployments are gated by this field, so it could considered critical since it can cause a scheduling decision...

...but nothing about deployments is concerned with sub-second latencies, so I think it's fine to leave this as Typical.

If you're in this code again maybe add a comment pointing out that while deployment status changes are not urgent, they can affect scheduling but not in a way that sub-second skew is significant.

If you really want to tidy things up the PR description misses this too:

Each allocation sends many updates during its lifetime, but only those that change the ClientStatus field are critical for progressing a deployment or kicking off a reschedule to recover from failures.

DeploymentStatus is critical for progressing a deployment as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically deployments are gated by this field, so it could considered critical since it can cause a scheduling decision...

...but nothing about deployments is concerned with sub-second latencies, so I think it's fine to leave this as Typical

Somehow I missed that, so yeah I would've set it to urgent based on the reasoning I had in the PR. I'll keep (for now at least) and I'll add some commentary here around reasoning for things.

ar.stateLock.RLock()
defer ar.stateLock.RUnlock()

last := ar.lastAcknowledgedState
if last == nil {
return false
return cstructs.AllocUpdatePriorityTypical
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't know what it was before, how can we assume the change is typical? Seems worth a comment especially since all of the other code in this method must check from Highest Priority to Lowest in order to ensure a change to a low priority field doesn't demote an actually high priority update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that we don't know for sure. In practice an allocation will never become healthy quickly enough that the first update we send is that update. That being said we probably should account for allocations that quickly fail because there's a bunch of things that can go unrecoverably wrong on the client before we ever hit the task runner, and it'd be nice to be able to send those failure states to the server more quickly.

tgross added a commit that referenced this pull request Jun 15, 2023
In #17354 we made client updates prioritized to reduce client-to-server
traffic. When the client has no previously-acknowledged update we assume that
the update is of typical priority; although we don't know that for sure in
practice an allocation will never become healthy quickly enough that the first
update we send is the update saying the alloc is healthy.

But that doesn't account for allocations that quickly fail in an unrecoverable
way because of allocrunner hook failures, and it'd be nice to be able to send
those failure states to the server more quickly. This changeset does so and adds
some extra comments on reasoning behind priority.
tgross added a commit that referenced this pull request Jun 26, 2023
In #17354 we made client updates prioritized to reduce client-to-server
traffic. When the client has no previously-acknowledged update we assume that
the update is of typical priority; although we don't know that for sure in
practice an allocation will never become healthy quickly enough that the first
update we send is the update saying the alloc is healthy.

But that doesn't account for allocations that quickly fail in an unrecoverable
way because of allocrunner hook failures, and it'd be nice to be able to send
those failure states to the server more quickly. This changeset does so and adds
some extra comments on reasoning behind priority.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

client: make batching of allocation updates smarter
4 participants