kv: enable delegate snapshots#83991
Conversation
|
This is the restore of the initial PR from amygao9@7c525d9 - this is still a WIP |
94241ca to
1a6329c
Compare
1a6329c to
a36d1ab
Compare
2e76245 to
89ad698
Compare
432c493 to
f8d5d89
Compare
f8d5d89 to
0efbfdf
Compare
7e6fa1d to
5878898
Compare
eefc14a to
b7c617d
Compare
andrewbaptist
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @herkolategan, @nvanbenschoten, and @smg260)
pkg/kv/kvserver/replica_command.go line 2861 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm still curious about this. Are there descriptor generation changes that don't necessarily mean this, but we're ok with false positives?
Just to record what we discussed and added a comment to it. This check probably is a little overly strict, which means some delegated snapshots may be rejected unnecessarily. This is not really a problem and should be pretty rare. We could likely remove this check in the future.
pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
but that risks holding the constraint for a longer window
The leaseholder also doesn't know which index to bound log truncation up to. It may have applied further than the delegated sender, and it doesn't want to truncate the log any higher than the index that the delegated snapshot is being sent.
I think this is why the RPC protocol was streaming before. We had envisioned a handshake where the leaseholder would ask the delegate for its applied index, install a constraint at that index (rechecking that it hadn't truncated above this already), then allow the delegated snapshot to proceed.
After discussing options on this I added the log truncation constraint up front on the coordinator and there is a risk of holding this longer than strictly required. There is a path to change this if necessary in the future, but it is unlikely to be a concern for most systems. https://cockroachlabs.atlassian.net/wiki/spaces/~6268113f52310b0068ffd245/pages/2869854482/Delegate+snapshot+overview
There are a few ways to solve this, but I don't think the streaming RPC is correct since the data required is disjoint for the different parts of the flow. There is a risk with the latest change that a delegated requests will fail since it won't have the latest applied index from the leaseholder. If that occurs often it would be worth changing to retrying on the delegate since it will "eventually" catch up in most cases.
pkg/kv/kvserver/replica_command.go line 2591 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/desc/replID/
Done
pkg/kv/kvserver/replica_command.go line 2611 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
small nit: the
0is unnecessary.
Done
pkg/kv/kvserver/replica_command.go line 2619 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use
ReplicaSet.GetReplicaDescriptorByIDhere?
Done
dd822da to
d5bcbc3
Compare
d312c2c to
ac66f07
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 18 of 20 files at r7, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Term: status.Term, DelegatedSender: sender, FirstIndex: appliedIndex,
Could you add a comment here about why you are setting FirstIndex to the value of appliedIndex on the leaseholder? It's subtle and could benefit from a discussion about the consequences and plans for future improvement.
pkg/kv/kvserver/replica_command.go line 2880 at r7 (raw file):
// If the generation has changed, this snapshot may be useless, so don't // attempt to send it. //NB: This is an overly strict check. If other delegates are added to this
nit: missing a space after // on each line.
dad9c3b to
0e75e33
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, @nvanbenschoten, and @smg260)
-- commits line 29 at r8:
nit: consider adding more words to the release note for this awesome work so that the average user can understand it
andrewbaptist
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @herkolategan, @nvanbenschoten, and @smg260)
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you add a comment here about why you are setting FirstIndex to the value of
appliedIndexon the leaseholder? It's subtle and could benefit from a discussion about the consequences and plans for future improvement.
Done
pkg/kv/kvserver/replica_command.go line 2880 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: missing a space after
//on each line.
Done
0e75e33 to
c8e870b
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Done
Did you miss a git push?
c8e870b to
eb03985
Compare
andrewbaptist
left a comment
There was a problem hiding this comment.
Reviewed 2 of 4 files at r8.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @herkolategan, @nvanbenschoten, and @smg260)
Previously, andreimatei (Andrei Matei) wrote…
nit: consider adding more words to the release note for this awesome work so that the average user can understand it
Thanks, I added some more to the release notes. I will likely publish a short blog no this as well when 23.1 comes out!
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you miss a
git push?
I'm not sure what happened - I rewrote and pushed again.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @andrewbaptist, @herkolategan, and @smg260)
Fixes: cockroachdb#42491 This commit allows a snapshot to be sent by a follower instead of the leader of a range. The follower(s) are chosen based on locality to the final recipient of the snapshot. If the follower is not able to quickly send the snapshot, the attempt is aborted and the leader sends the snapshot instead. By choosing a delegate rather than sending the snapshot directly, WAN traffic can be minimized. Additionally the snapshot will likely be delivered faster. There are two settings that control this feature. The first, `kv.snapshot_delegation.num_follower`, controls how many followers the snapshot is attempted to be delegated through. If set to 0, then snapshot delegation is disabled. The second, `kv.snapshot_delegation_queue.enabled`, controls whether delegated snapshots will queue on the delegate or return failure immediately. This is useful to prevent a delegation request from spending a long time waiting before it is sent. Before the snapshot is sent from the follower checks are done to verify that the delegate is able to send a snapshot that will be valid for the recipient. If not the request is rerouted to the leader. Release note (performance improvement): Adds delegated snapshots which can reduce WAN traffic for snapshot movement. If there is another replica for this range with a closer locality than the delegate, the leaseholder will attempt to have that delegate send the snapshot. This is particularly useful in the case of a decommission of a node where most snapshots are transferred to another replica in the same locality.
eb03985 to
760aedb
Compare
|
bors r=nvanbenschoten |
|
Build succeeded: |
|
congrats!! |
kvserver: delegate snapshots to followers
Fixes: #42491
This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot. If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.
By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.
There are two settings that control this feature. The first,
kv.snapshot_delegation.num_follower, controls how many followersthe snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
kv.snapshot_delegation_queue.enabled, controls whether delegatedsnapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.
Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.
Release note (performance improvement): Adds delegated snapshots which can reduce WAN traffic for snapshot movement.