kvflowcontroller: eliminate mutex contention #109170

irfansharif · 2023-08-21T19:08:20Z

Under kv0/enc=false/nodes=3/cpu=96 we observed significant mutex contention on kvflowcontroller.Controller.mu. We were using a single mutex to adjust flow tokens across all replication streams. There's a natural sharding available here - by replication stream - that eliminates the contention and fixes the throughput drop.

The kv0 test surfaced other performance optimizations (mutex contention, allocated objects, etc.) that we'll address in subsequent PRs.

Release note: None

blathers-crl · 2023-08-21T19:08:23Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-08-21T19:08:31Z

This change is

aadityasondhi

Looks good to me. So the "sharding" happens now since you only lock the individual bucket and not the entire controller?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

sumeerbhola

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 70 at r1 (raw file):

		// streams get closed permanently (tenants get deleted, nodes removed)
		// or when completely inactive (no tokens deducted/returned over 30+
		// minutes), clear these out.

IIUC, this per bucket mutex works trivially now because there is no concern that a bucket will be GC'd and then recreated for the same stream (since buckets are never GC'd), resulting in a race where tokens are added/subtracted from the stale bucket.

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 162 at r1 (raw file):

				for _, b := range c.mu.buckets {
					b.mu.Lock()
					sum += int64(b.tokensLocked(wc))

this can use tokens(wc)?

Fixes cockroachdb#105508. Under kv0/enc=false/nodes=3/cpu=96 we observed significant mutex contention on kvflowcontroller.Controller.mu. We were using a single mutex to adjust flow tokens across all replication streams. There's a natural sharding available here - by replication stream - that eliminates the contention and fixes the throughput drop. The kv0 test surfaced other performance optimizations (mutex contention, allocated objects, etc.) that we'll address in subsequent PRs. Release note: None

irfansharif

So the "sharding" happens now since you only lock the individual bucket and not the entire controller?

Yes.

TFTR! bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 70 at r1 (raw file):

Previously, sumeerbhola wrote…

IIUC, this per bucket mutex works trivially now because there is no concern that a bucket will be GC'd and then recreated for the same stream (since buckets are never GC'd), resulting in a race where tokens are added/subtracted from the stale bucket.

Yes. Can figure out something then - perhaps always grabbing a read lock when reading or adding some synchronization state gc-ed within each bucket.

pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 162 at r1 (raw file):

Previously, sumeerbhola wrote…

this can use tokens(wc)?

Yes, done.

irfansharif · 2023-08-23T16:39:04Z

bors r+

craig · 2023-08-23T17:59:18Z

Build succeeded:

Bazel Essential CI (Cockroach)

irfansharif requested a review from sumeerbhola August 21, 2023 19:08

irfansharif requested a review from a team as a code owner August 21, 2023 19:08

irfansharif requested a review from aadityasondhi August 21, 2023 19:16

aadityasondhi approved these changes Aug 22, 2023

View reviewed changes

sumeerbhola approved these changes Aug 22, 2023

View reviewed changes

irfansharif force-pushed the 230821.kvflowcontrol-mutex branch from d1d8141 to d36d654 Compare August 23, 2023 13:44

irfansharif force-pushed the 230821.kvflowcontrol-mutex branch from d36d654 to 4d81146 Compare August 23, 2023 13:44

irfansharif commented Aug 23, 2023

View reviewed changes

craig bot merged commit 34fb6d7 into cockroachdb:master Aug 23, 2023

irfansharif deleted the 230821.kvflowcontrol-mutex branch August 23, 2023 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvflowcontroller: eliminate mutex contention #109170

kvflowcontroller: eliminate mutex contention #109170

irfansharif commented Aug 21, 2023

blathers-crl bot commented Aug 21, 2023

cockroach-teamcity commented Aug 21, 2023

aadityasondhi left a comment

sumeerbhola left a comment

irfansharif left a comment

irfansharif commented Aug 23, 2023

craig bot commented Aug 23, 2023

kvflowcontroller: eliminate mutex contention #109170

kvflowcontroller: eliminate mutex contention #109170

Conversation

irfansharif commented Aug 21, 2023

blathers-crl bot commented Aug 21, 2023

cockroach-teamcity commented Aug 21, 2023

aadityasondhi left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif commented Aug 23, 2023

craig bot commented Aug 23, 2023