kv: detect lease transfer and back off in DistSender #32877

ajwerner · 2018-12-05T23:16:26Z

This PR address a problem which could lead to very long stalls in range
throughput when a lease transfer occurs when under load. As soon as the
current lease holder begins a lease transfer, it rejects all future requests
to the range with a NotLeaseHolderError which contains the new lease
information. As soon as this happens, the new lease holder immediately begins
receiving requests but is not able to service those requests until it processes
the raft command that makes it the lease hold. Until it applies that command, it
returns NotLeaseHolderError with the previous lease information. Prior to this
change, the DistSender would immediately retry the request at the node indicated
in the most recent NotLeaseHolderError it has received. This leads to a tight
loop of requests bouncing between the current lease holder and the new lease
holder which is unaware of the pending transfer (as observed in #22837) . The
amount of load generated by this traffic can grind raft progress to a complete
halt, with the author observing multi-minute durations for the new node to
process a raft Ready and hundreds of milliseconds to process a single command.
Fortunately, the DistSender can detect when this situation is occurring and can
back off accordingly.

This change detects that a replica is in the midst of a lease transfer by
noticing that it continues to receive NotLeaseHolderErrors without observing
new lease sequence number. In this case, the DistSender backs off exponentially
until it succeeds, fails, or observes a new lease sequence.

Fixes #22837, Fixes #32367

Release note: None

cockroach-teamcity · 2018-12-05T23:16:32Z

This change is

nvanbenschoten

but @tbg should give this a pass as well because he just reworked some of this.

Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/kv/dist_sender.go, line 97 at r1 (raw file):

		Unit:        metric.Unit_COUNT,
	}
	metaDistSenderInTransferBackoffsErrCount = metric.Metadata{

Might want to work the word "lease" in here.

pkg/kv/dist_sender_test.go, line 648 at r1 (raw file):

		},
	}
	for i, c := range []struct {

These tests are nice.

This PR address a problem which could lead to very long stalls in range throughput when a lease transfer occurs when under load. As soon as the current lease holder begins a lease transfer, it rejects all future requests to the range with a NotLeaseHolderError which contains the new lease information. As soon as this happens, the new lease holder immediately begins receiving requests but is not able to service those requests until it processes the raft command that makes it the lease hold. Until it applies that command, it returns NotLeaseHolderError with the previous lease information. Prior to this change, the DistSender would immediately retry the request at the node indicated in the most recent NotLeaseHolderError it has received. This leads to a tight loop of requests bouncing between the current lease holder and the new lease holder which is unaware of the pending transfer (as observed in cockroachdb#22837) . The amount of load generated by this traffic can grind raft progress to a complete halt, with the author observing multi-minute durations for the new node to process a raft Ready and hundreds of milliseconds to process a single command. Fortunately, the DistSender can detect when this situation is occurring and can back off accordingly. This change detects that a replica is in the midst of a lease transfer by noticing that it continues to receive NotLeaseHolderErrors without observing new lease sequence number. In this case, the DistSender backs off exponentially until it succeeds, fails, or observes a new lease sequence. Fixes cockroachdb#22837, Fixes cockroachdb#32367 Release note: None

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/kv/dist_sender.go, line 97 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Might want to work the word "lease" in here.

Done.

tbg

Kind of amazing this managed t oback things up for minutes. You were running with a large number of clients, right?

Reviewed 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

ajwerner

TFYR! Concurrency was set to 4096

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

ajwerner · 2018-12-07T16:59:46Z

bors r+

32877: kv: detect lease transfer and back off in DistSender r=ajwerner a=ajwerner This PR address a problem which could lead to very long stalls in range throughput when a lease transfer occurs when under load. As soon as the current lease holder begins a lease transfer, it rejects all future requests to the range with a NotLeaseHolderError which contains the new lease information. As soon as this happens, the new lease holder immediately begins receiving requests but is not able to service those requests until it processes the raft command that makes it the lease hold. Until it applies that command, it returns NotLeaseHolderError with the previous lease information. Prior to this change, the DistSender would immediately retry the request at the node indicated in the most recent NotLeaseHolderError it has received. This leads to a tight loop of requests bouncing between the current lease holder and the new lease holder which is unaware of the pending transfer (as observed in #22837) . The amount of load generated by this traffic can grind raft progress to a complete halt, with the author observing multi-minute durations for the new node to process a raft Ready and hundreds of milliseconds to process a single command. Fortunately, the DistSender can detect when this situation is occurring and can back off accordingly. This change detects that a replica is in the midst of a lease transfer by noticing that it continues to receive NotLeaseHolderErrors without observing new lease sequence number. In this case, the DistSender backs off exponentially until it succeeds, fails, or observes a new lease sequence. Fixes #22837, Fixes #32367 Release note: None Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

craig · 2018-12-07T17:20:30Z

Build succeeded

GitHub CI (Cockroach)

ajwerner requested review from tbg, nvanbenschoten and a team December 5, 2018 23:16

nvanbenschoten approved these changes Dec 6, 2018

View reviewed changes

ajwerner force-pushed the ajwerner/dist-sender-backoff-in-transfer branch from 11e78d2 to 89d349a Compare December 6, 2018 21:15

ajwerner commented Dec 6, 2018

View reviewed changes

tbg approved these changes Dec 7, 2018

View reviewed changes

ajwerner commented Dec 7, 2018

View reviewed changes

craig bot merged commit 89d349a into cockroachdb:master Dec 7, 2018

arulajmani mentioned this pull request Jan 19, 2024

kvclient: remove retries within sendToReplicas #117632

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: detect lease transfer and back off in DistSender #32877

kv: detect lease transfer and back off in DistSender #32877

ajwerner commented Dec 5, 2018

cockroach-teamcity commented Dec 5, 2018

nvanbenschoten left a comment

ajwerner left a comment

tbg left a comment

ajwerner left a comment

ajwerner commented Dec 7, 2018

craig bot commented Dec 7, 2018

kv: detect lease transfer and back off in DistSender #32877

kv: detect lease transfer and back off in DistSender #32877

Conversation

ajwerner commented Dec 5, 2018

cockroach-teamcity commented Dec 5, 2018

nvanbenschoten left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner commented Dec 7, 2018

craig bot commented Dec 7, 2018

Build succeeded