-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/concurrency: switch lockTableImpl to use the specialized #45276
Conversation
I'm out today, but I'm looking forward to reviewing this soon! Thanks for making the change. |
c008871
to
8080907
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/storage/concurrency/lock_table.go, line 67 at r1 (raw file):
mu syncutil.RWMutex // Protects everything in this struct. lockIDSeqNum uint64
Could you give this a comment?
pkg/storage/concurrency/lock_table.go, line 135 at r1 (raw file):
} func newLockTable(maxLocks int64) lockTable {
Do you mind inlining this constructor in each of the callers, now that it's trivial? This is in line with our use of the rest of the objects in this package.
pkg/storage/concurrency/lock_table.go, line 218 at r1 (raw file):
// the lockStates in these snapshots may have been removed from // lockTableImpl. Additionally, it is possible that there is a new lockState // for the same key. This can result in various harmless anomalies:
Both of these anomalies, and "the case where a reservation of a latch holder is broken", would be avoided if we scanned and entered all conflicting lock wait-queues while holding the read-lock and then waited in each queue as a second phase after releasing the read-lock.
Have you given any more thought to this approach? We have previously discussed trade-offs in terms of fairness vs. throughput between the eager queueing approach and the lazy queuing approach. How would we evaluate which approach is better suited for this situation?
pkg/storage/concurrency/lock_table.go, line 221 at r1 (raw file):
// - the request may hold a reservation on a lockState that is no longer // in the tree. When it next does a scan, it will either find a new // lockState where it will compete or none. Both lockStates can be in
compete -> complete?
pkg/storage/concurrency/lock_table.go, line 357 at r1 (raw file):
iter := tree.MakeIter() // From here on, the use of resumingInSameSpan is just a performance
Did you explore the use of FirstOverlap
and NextOverlap
for this scan? It might eliminate some of the complexity of this comparison here, even before we introduce ranged locks into the lockTable.
Same question for below.
pkg/storage/concurrency/lock_table.go, line 423 at r1 (raw file):
type lockState struct { id uint64 // needed for implementing util/interval/generic type contract endKey []byte // unused except for btree tests
This is unfortunate, but I see why you needed it.
pkg/storage/concurrency/lock_table.go, line 1488 at r1 (raw file):
if len(g.spans.GetSpans(sa, ss)) > 0 { t.locks[ss].mu.RLock() g.tableSnapshot[ss].Reset()
Don't we need to reset this if we don't hit this branch for either span access?
pkg/storage/concurrency/lock_table.go, line 1523 at r1 (raw file):
for i := 0; i < len(locksToGC); i++ { if len(locksToGC[i]) > 0 { sort.Sort(sortLocks(locksToGC[i]))
Why is this necessary? Could you leave a comment? This will perform a memory allocation, so we should justify the cost.
8080907
to
a84774c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/concurrency/lock_table.go, line 67 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you give this a comment?
Done
pkg/storage/concurrency/lock_table.go, line 135 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you mind inlining this constructor in each of the callers, now that it's trivial? This is in line with our use of the rest of the objects in this package.
Done
pkg/storage/concurrency/lock_table.go, line 218 at r1 (raw file):
Both of these anomalies, and "the case where a reservation of a latch holder is broken", would be avoided if we scanned and entered all conflicting lock wait-queues while holding the read-lock and then waited in each queue as a second phase after releasing the read-lock.
Doing the scan while holding the read lock gives back some of the benefits of using this btree implementation.
But setting that aside: A non-transactional read/write or transactional read will be removed from the lockState once it no longer needs to wait, even if it is not actively waiting there. If that makes the lockState empty, that lockState could be removed from the primary btree but still exist in the snapshot. I think this is the only case where it can see a different lockState -- the one in the primary btree may have been recreated due to a lock acquisition.
Have you given any more thought to this approach? We have previously discussed trade-offs in terms of fairness vs. throughput between the eager queueing approach and the lazy queuing approach. How would we evaluate which approach is better suited for this situation?
I would prefer if we experimentally evaluated the choice and use that to decide -- I view the "anomalies" listed here as not important enough to motivate one choice over another. If you have some experiments in mind, I am happy to run them. What do you think?
Meanwhile, I've added a TODO.
pkg/storage/concurrency/lock_table.go, line 221 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
compete -> complete?
it was meant to be "compete" since they are competing for a lock.
pkg/storage/concurrency/lock_table.go, line 357 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you explore the use of
FirstOverlap
andNextOverlap
for this scan? It might eliminate some of the complexity of this comparison here, even before we introduce ranged locks into the lockTable.Same question for below.
I did not know about its existence. I've switched to it.
pkg/storage/concurrency/lock_table.go, line 423 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is unfortunate, but I see why you needed it.
Now used for iteration too.
pkg/storage/concurrency/lock_table.go, line 1488 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Don't we need to reset this if we don't hit this branch for either span access?
The snapshot starts off in the empty state, and the spans for this request are const, so if it wan't populated in the first scan it will never be populated.
pkg/storage/concurrency/lock_table.go, line 1523 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why is this necessary? Could you leave a comment? This will perform a memory allocation, so we should justify the cost.
Memory allocation because of the type conversion? Won't this be on the stack?
I've removed this since seeking or calling FirstOverlap
on an iterator does not take into account the current position so iterating in order has no algorithmic benefit. It may help with caching but there is no change to BenchmarkLockTable/groups=1,outstanding=16,read=0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/storage/concurrency/lock_table.go, line 218 at r1 (raw file):
Doing the scan while holding the read lock gives back some of the benefits of using this btree implementation.
Yes, that is true. I thought through whether holding latches (but not the read lock) while iterating the snapshot would help us here, but it doesn't look like it. The only way to eliminate the anomalies is if we can synchronize all scans of the tree with all overlapping removals from the primary tree. Latches alone don't ensure this degree of synchronization.
What do you think?
That all sounds reasonable to me. The experiment that comes to mind is a workload with contended point reads (GetRequests) and ranged writes (DeleteRange). This workload will effectively test this fairness vs. throughput tradeoff.
I'm perfectly fine with a TODO for now.
pkg/storage/concurrency/lock_table.go, line 357 at r1 (raw file):
Previously, sumeerbhola wrote…
I did not know about its existence. I've switched to it.
Thanks. Unfortunately, I'm realizing now that the lockState
you provided to FirstOverlap
might be escaping to the heap and forcing an allocation. Do you mind checking? If so, we can either return to your previous approach stash a tmp lockState
on lockTableGuardImpl
to avoid the allocation.
EDIT: I just tested something similar and suspect that it is forcing an allocation.
EDIT 2: I'll send you a PR to fix this API so that we don't run into this.
pkg/storage/concurrency/lock_table.go, line 1488 at r1 (raw file):
Previously, sumeerbhola wrote…
The snapshot starts off in the empty state, and the spans for this request are const, so if it wan't populated in the first scan it will never be populated.
Oh, I see. That's really subtle.
pkg/storage/concurrency/lock_table.go, line 1523 at r1 (raw file):
Memory allocation because of the type conversion? Won't this be on the stack?
It will allocate because it's passed through a sort.Interface
, which will force the slice header onto the heap.
It may help with caching but there is no change to BenchmarkLockTable/groups=1,outstanding=16,read=0.
Did removing this have an effect on the memory allocations in that benchmark?
Related to cockroachdb#45276. Prior to this change, the argument provided to `iterator.FirstOverlap` was stored in the iterator. This caused the argument to escape to the heap (if the parameterized type was a pointer) and would force an allocation if the argument was not already on the heap. This was not the intention and was causing issues in cockroachdb#45276. This commit fixes the issue by no longer storing the argument in the iterator's `overlapScan` object. This allows escape analysis to determine that the argument does not escape, like it was already doing for `iterator.SeekGE` and `iterator.SeekLT`. ``` name old time/op new time/op delta BTreeIterFirstOverlap/count=16-16 400ns ± 1% 325ns ± 5% -18.75% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=128-16 661ns ± 0% 581ns ± 1% -12.02% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=1024-16 1.15µs ± 2% 1.07µs ± 2% -6.76% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=8192-16 1.84µs ± 2% 1.76µs ± 2% -4.38% (p=0.016 n=5+5) BTreeIterFirstOverlap/count=65536-16 2.62µs ± 2% 2.52µs ± 1% -4.07% (p=0.008 n=5+5) BTreeIterNextOverlap-16 11.1ns ± 2% 10.7ns ± 0% -3.95% (p=0.016 n=5+4) BTreeIterOverlapScan-16 34.2µs ± 1% 31.2µs ± 1% -8.78% (p=0.008 n=5+5) name old alloc/op new alloc/op delta BTreeIterFirstOverlap/count=16-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=128-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=1024-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=8192-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=65536-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterNextOverlap-16 0.00B 0.00B ~ (all equal) BTreeIterOverlapScan-16 144B ± 0% 48B ± 0% -66.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta BTreeIterFirstOverlap/count=16-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=128-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=1024-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=8192-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=65536-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterNextOverlap-16 0.00 0.00 ~ (all equal) BTreeIterOverlapScan-16 6.40 ± 9% 5.00 ± 0% -21.88% (p=0.008 n=5+5) ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/storage/concurrency/lock_table.go, line 357 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Thanks. Unfortunately, I'm realizing now that the
lockState
you provided toFirstOverlap
might be escaping to the heap and forcing an allocation. Do you mind checking? If so, we can either return to your previous approach stash a tmplockState
onlockTableGuardImpl
to avoid the allocation.EDIT: I just tested something similar and suspect that it is forcing an allocation.
EDIT 2: I'll send you a PR to fix this API so that we don't run into this.
This should be fixed by #45423.
45413: sqlbase: plumb DummyPrivilegedAccessor to relevant initializers r=rohany a=otan Refs: #45408 No functional change but a help with debugging, initialize PrivilegedAccessor with a dummy one much like we do with SessionAccessor. Release note: None 45423: interval/generic: avoid letting argument to FirstOverlap escape to heap r=nvanbenschoten a=nvanbenschoten Related to #45276. Prior to this change, the argument provided to `iterator.FirstOverlap` was stored in the iterator. This caused the argument to escape to the heap (if the parameterized type was a pointer) and would force an allocation if the argument was not already on the heap. This was not the intention and was causing issues in #45276. This commit fixes the issue by no longer storing the argument in the iterator's `overlapScan` object. This allows escape analysis to determine that the argument does not escape, like it was already doing for `iterator.SeekGE` and `iterator.SeekLT`. ``` name old time/op new time/op delta BTreeIterFirstOverlap/count=16-16 400ns ± 1% 325ns ± 5% -18.75% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=128-16 661ns ± 0% 581ns ± 1% -12.02% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=1024-16 1.15µs ± 2% 1.07µs ± 2% -6.76% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=8192-16 1.84µs ± 2% 1.76µs ± 2% -4.38% (p=0.016 n=5+5) BTreeIterFirstOverlap/count=65536-16 2.62µs ± 2% 2.52µs ± 1% -4.07% (p=0.008 n=5+5) BTreeIterNextOverlap-16 11.1ns ± 2% 10.7ns ± 0% -3.95% (p=0.016 n=5+4) BTreeIterOverlapScan-16 34.2µs ± 1% 31.2µs ± 1% -8.78% (p=0.008 n=5+5) name old alloc/op new alloc/op delta BTreeIterFirstOverlap/count=16-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=128-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=1024-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=8192-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=65536-16 96.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) BTreeIterNextOverlap-16 0.00B 0.00B ~ (all equal) BTreeIterOverlapScan-16 144B ± 0% 48B ± 0% -66.67% (p=0.008 n=5+5) name old allocs/op new allocs/op delta BTreeIterFirstOverlap/count=16-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=128-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=1024-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=8192-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterFirstOverlap/count=65536-16 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) BTreeIterNextOverlap-16 0.00 0.00 ~ (all equal) BTreeIterOverlapScan-16 6.40 ± 9% 5.00 ± 0% -21.88% (p=0.008 n=5+5) ``` Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
92f2ba8
to
205d668
Compare
copy-on-write btree The lockTableGuardImpl now keeps a snapshot of the relevant btrees and waits on the locks there, which narrows the critical section where treeMu.mu is held in read-mode. Note that this introduces some harmless (from a correctness perspective, and hopefully from a performance perspective) anomalies where a request can wait on a stale lockState object. These are documented in the code. The switch to this btree did not eliminate the case where a reservation of a latch holder is broken -- the "basic" test has been tweaked to show how that can still happen. Microbenchmarks are mostly better: name old time/op new time/op delta LockTable/groups=1,outstanding=1,read=0/-16 9.77µs ± 2% 8.51µs ± 2% -12.94% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=1/-16 8.68µs ± 2% 7.60µs ± 2% -12.46% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=2/-16 7.22µs ± 2% 6.79µs ± 9% ~ (p=0.151 n=5+5) LockTable/groups=1,outstanding=1,read=3/-16 6.40µs ± 1% 5.75µs ± 1% -10.19% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=4/-16 4.23µs ± 1% 3.82µs ± 1% -9.84% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=5/-16 4.23µs ± 2% 3.79µs ± 1% -10.24% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=0/-16 14.2µs ± 4% 13.3µs ± 1% -6.32% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=1/-16 13.0µs ± 2% 12.3µs ± 6% ~ (p=0.095 n=5+5) LockTable/groups=1,outstanding=2,read=2/-16 11.6µs ± 2% 10.5µs ±26% ~ (p=0.151 n=5+5) LockTable/groups=1,outstanding=2,read=3/-16 10.4µs ± 2% 10.8µs ±30% ~ (p=0.690 n=5+5) LockTable/groups=1,outstanding=2,read=4/-16 3.01µs ± 3% 2.84µs ±30% ~ (p=0.151 n=5+5) LockTable/groups=1,outstanding=2,read=5/-16 3.37µs ± 7% 2.83µs ±12% -16.00% (p=0.016 n=5+5) LockTable/groups=1,outstanding=4,read=0/-16 18.6µs ± 3% 15.9µs ± 4% -14.37% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=1/-16 16.5µs ± 4% 14.2µs ± 1% -13.89% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=2/-16 14.6µs ± 2% 12.8µs ± 0% -12.39% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=3/-16 11.8µs ± 3% 10.0µs ± 1% -14.77% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=4/-16 2.11µs ± 4% 1.73µs ±12% -18.01% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=5/-16 2.16µs ± 5% 1.86µs ±13% -13.78% (p=0.016 n=5+5) LockTable/groups=1,outstanding=8,read=0/-16 24.6µs ± 3% 17.7µs ± 2% -27.92% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=1/-16 20.5µs ± 1% 16.0µs ± 1% -22.17% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=2/-16 17.8µs ± 4% 14.7µs ± 2% -17.58% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=3/-16 15.2µs ± 2% 12.1µs ± 4% -20.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=4/-16 2.14µs ±11% 1.36µs ±10% -36.53% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=5/-16 2.28µs ± 3% 1.60µs ± 7% -29.53% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=0/-16 28.3µs ±10% 27.5µs ± 8% ~ (p=0.690 n=5+5) LockTable/groups=1,outstanding=16,read=1/-16 24.0µs ± 2% 22.9µs ± 5% ~ (p=0.151 n=5+5) LockTable/groups=1,outstanding=16,read=2/-16 21.3µs ± 1% 22.0µs ± 8% ~ (p=0.103 n=5+5) LockTable/groups=1,outstanding=16,read=3/-16 20.5µs ± 5% 19.1µs ± 8% ~ (p=0.095 n=5+5) LockTable/groups=1,outstanding=16,read=4/-16 2.01µs ± 5% 1.34µs ± 4% -33.61% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=5/-16 2.09µs ± 2% 1.44µs ± 6% -31.10% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=0/-16 22.7µs ±10% 16.7µs ±25% -26.28% (p=0.016 n=5+5) LockTable/groups=16,outstanding=1,read=1/-16 16.1µs ± 0% 10.8µs ± 2% -32.86% (p=0.029 n=4+4) LockTable/groups=16,outstanding=1,read=2/-16 12.3µs ± 2% 9.6µs ±21% -21.76% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=3/-16 8.60µs ± 1% 7.94µs ±11% ~ (p=0.690 n=5+5) LockTable/groups=16,outstanding=1,read=4/-16 1.82µs ± 7% 1.25µs ±21% -30.93% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=5/-16 1.89µs ± 3% 1.50µs ±14% -20.55% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=0/-16 28.2µs ± 8% 19.5µs ±11% -30.60% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=1/-16 21.7µs ± 3% 15.8µs ±20% -27.46% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=2/-16 14.8µs ± 1% 12.9µs ±39% ~ (p=0.151 n=5+5) LockTable/groups=16,outstanding=2,read=3/-16 9.70µs ± 2% 8.42µs ± 1% -13.23% (p=0.016 n=5+4) LockTable/groups=16,outstanding=2,read=4/-16 1.65µs ±11% 1.38µs ±18% -16.37% (p=0.032 n=5+5) LockTable/groups=16,outstanding=2,read=5/-16 1.70µs ± 2% 1.60µs ±29% ~ (p=0.151 n=5+5) LockTable/groups=16,outstanding=4,read=0/-16 26.0µs ± 7% 22.5µs ±11% -13.59% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=1/-16 21.9µs ± 3% 18.6µs ±24% ~ (p=0.151 n=5+5) LockTable/groups=16,outstanding=4,read=2/-16 16.0µs ± 0% 16.3µs ±16% ~ (p=0.690 n=5+5) LockTable/groups=16,outstanding=4,read=3/-16 12.5µs ± 4% 13.7µs ±14% +9.82% (p=0.032 n=5+5) LockTable/groups=16,outstanding=4,read=4/-16 1.62µs ± 9% 1.42µs ±23% ~ (p=0.421 n=5+5) LockTable/groups=16,outstanding=4,read=5/-16 1.68µs ± 2% 1.43µs ±25% ~ (p=0.151 n=5+5) LockTable/groups=16,outstanding=8,read=0/-16 28.3µs ± 6% 24.8µs ±15% ~ (p=0.056 n=5+5) LockTable/groups=16,outstanding=8,read=1/-16 23.6µs ± 3% 22.8µs ± 8% ~ (p=0.222 n=5+5) LockTable/groups=16,outstanding=8,read=2/-16 17.8µs ± 2% 24.6µs ±23% +38.37% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=3/-16 15.2µs ± 3% 15.0µs ± 9% ~ (p=0.151 n=5+5) LockTable/groups=16,outstanding=8,read=4/-16 1.65µs ±11% 1.95µs ±39% ~ (p=0.151 n=5+5) LockTable/groups=16,outstanding=8,read=5/-16 1.73µs ± 1% 1.42µs ±10% -17.67% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=0/-16 33.1µs ± 6% 25.5µs ±13% -23.04% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=1/-16 27.2µs ± 2% 21.3µs ± 1% -21.57% (p=0.016 n=5+4) LockTable/groups=16,outstanding=16,read=2/-16 20.8µs ± 0% 24.2µs ±19% ~ (p=0.190 n=4+5) LockTable/groups=16,outstanding=16,read=3/-16 18.6µs ± 2% 20.4µs ± 6% +9.69% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=4/-16 1.68µs ± 4% 1.59µs ± 7% ~ (p=0.095 n=5+5) LockTable/groups=16,outstanding=16,read=5/-16 1.75µs ± 1% 1.36µs ± 7% -22.61% (p=0.008 n=5+5) name old alloc/op new alloc/op delta LockTable/groups=1,outstanding=1,read=0/-16 7.30kB ± 0% 4.11kB ± 0% -43.68% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=1/-16 6.34kB ± 0% 3.63kB ± 0% -42.73% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=2/-16 5.38kB ± 0% 3.15kB ± 0% -41.44% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=3/-16 4.41kB ± 0% 2.67kB ± 0% -39.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.71% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.71% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=0/-16 9.38kB ± 0% 5.16kB ± 0% -45.00% (p=0.000 n=5+4) LockTable/groups=1,outstanding=2,read=1/-16 8.41kB ± 0% 4.67kB ± 0% -44.45% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=2/-16 7.45kB ± 0% 4.19kB ± 0% -43.73% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=3/-16 6.46kB ± 0% 3.40kB ± 0% -47.36% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=0/-16 9.46kB ± 0% 5.21kB ± 0% -44.95% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=1/-16 8.50kB ± 0% 4.75kB ± 0% -44.13% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=2/-16 7.53kB ± 0% 4.28kB ± 0% -43.22% (p=0.016 n=4+5) LockTable/groups=1,outstanding=4,read=3/-16 6.55kB ± 0% 3.47kB ± 0% -47.01% (p=0.029 n=4+4) LockTable/groups=1,outstanding=4,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.68% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.71% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=0/-16 9.47kB ± 0% 5.31kB ± 0% -43.86% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=1/-16 8.52kB ± 0% 4.83kB ± 0% -43.29% (p=0.016 n=4+5) LockTable/groups=1,outstanding=8,read=2/-16 7.58kB ± 0% 4.35kB ± 0% -42.61% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=3/-16 6.62kB ± 0% 3.52kB ± 0% -46.76% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.68% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=0/-16 9.43kB ± 0% 5.49kB ± 1% -41.77% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=1/-16 8.44kB ± 0% 5.10kB ± 0% -39.53% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=2/-16 7.47kB ± 0% 4.69kB ± 1% -37.20% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=3/-16 6.81kB ± 0% 3.95kB ± 1% -41.94% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=0/-16 7.35kB ± 0% 4.86kB ± 0% -33.96% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=1/-16 6.38kB ± 0% 4.03kB ± 0% -36.84% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=2/-16 5.42kB ± 0% 3.45kB ± 0% -36.40% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=3/-16 4.45kB ± 0% 2.96kB ± 0% -33.43% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.000 n=5+4) LockTable/groups=16,outstanding=2,read=0/-16 9.42kB ± 0% 5.67kB ± 0% -39.80% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=1/-16 8.43kB ± 0% 4.90kB ± 1% -41.93% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=2/-16 6.63kB ± 0% 3.67kB ± 0% -44.70% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=3/-16 5.04kB ± 0% 3.12kB ± 0% -38.09% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=0/-16 9.42kB ± 0% 6.09kB ± 0% -35.35% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=1/-16 8.50kB ± 0% 5.27kB ± 0% -38.03% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=2/-16 7.34kB ± 0% 4.08kB ± 0% -44.43% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=3/-16 6.13kB ± 0% 3.38kB ± 0% -44.85% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.000 n=5+4) LockTable/groups=16,outstanding=4,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=0/-16 9.40kB ± 0% 6.06kB ± 0% -35.52% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=1/-16 8.50kB ± 0% 5.22kB ± 1% -38.57% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=2/-16 7.49kB ± 0% 4.16kB ± 0% -44.45% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=3/-16 6.56kB ± 0% 3.47kB ± 0% -47.16% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.73% (p=0.029 n=4+4) LockTable/groups=16,outstanding=8,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.70% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=0/-16 9.46kB ± 0% 5.72kB ± 1% -39.54% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=1/-16 8.60kB ± 1% 5.09kB ± 1% -40.79% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=2/-16 7.83kB ± 0% 4.22kB ± 0% -46.14% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=3/-16 6.92kB ± 0% 3.59kB ± 0% -48.13% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=4/-16 2.55kB ± 0% 1.28kB ± 0% -49.69% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=5/-16 2.55kB ± 0% 1.28kB ± 0% -49.71% (p=0.000 n=5+4) name old allocs/op new allocs/op delta LockTable/groups=1,outstanding=1,read=0/-16 34.0 ± 0% 21.0 ± 0% -38.24% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=1/-16 28.0 ± 0% 17.0 ± 0% -39.29% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=2/-16 22.0 ± 0% 13.0 ± 0% -40.91% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=3/-16 16.0 ± 0% 9.0 ± 0% -43.75% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=1,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=0/-16 41.0 ± 0% 24.0 ± 0% -41.46% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=1/-16 35.0 ± 0% 20.0 ± 0% -42.86% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=2/-16 29.0 ± 0% 16.0 ± 0% -44.83% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=3/-16 23.0 ± 0% 11.0 ± 0% -52.17% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=2,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=0/-16 42.0 ± 0% 24.0 ± 0% -42.86% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=1/-16 36.0 ± 0% 21.0 ± 0% -41.67% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=2/-16 30.0 ± 0% 17.0 ± 0% -43.33% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=3/-16 23.0 ± 0% 11.0 ± 0% -52.17% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=4,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=0/-16 42.0 ± 0% 25.0 ± 0% -40.48% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=1/-16 36.0 ± 0% 21.0 ± 0% -41.67% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=2/-16 30.0 ± 0% 17.0 ± 0% -43.33% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=3/-16 24.0 ± 0% 12.0 ± 0% -50.00% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=8,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=0/-16 42.0 ± 0% 25.0 ± 0% -40.48% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=1/-16 36.0 ± 0% 22.0 ± 0% -38.89% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=2/-16 30.0 ± 0% 18.6 ± 3% -38.00% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=3/-16 25.0 ± 0% 14.0 ± 0% -44.00% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=1,outstanding=16,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=0/-16 34.0 ± 0% 23.0 ± 0% -32.35% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=1/-16 28.0 ± 0% 18.0 ± 0% -35.71% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=2/-16 22.0 ± 0% 14.0 ± 0% -36.36% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=3/-16 16.0 ± 0% 10.0 ± 0% -37.50% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=1,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=0/-16 43.0 ± 0% 27.0 ± 0% -37.21% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=1/-16 36.0 ± 0% 21.0 ± 0% -41.67% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=2/-16 26.4 ± 2% 14.0 ± 0% -46.97% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=3/-16 18.0 ± 0% 10.0 ± 0% -44.44% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=2,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=0/-16 43.0 ± 0% 28.0 ± 0% -34.88% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=1/-16 37.0 ± 0% 22.0 ± 0% -40.54% (p=0.000 n=5+4) LockTable/groups=16,outstanding=4,read=2/-16 29.0 ± 0% 16.0 ± 0% -44.83% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=3/-16 22.0 ± 0% 11.0 ± 0% -50.00% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=4,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=0/-16 43.0 ± 0% 27.0 ± 0% -37.21% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=1/-16 37.0 ± 0% 21.0 ± 0% -43.24% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=2/-16 29.0 ± 0% 15.6 ± 4% -46.21% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=3/-16 23.0 ± 0% 11.0 ± 0% -52.17% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=8,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=0/-16 43.0 ± 0% 24.0 ± 0% -44.19% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=1/-16 37.0 ± 0% 20.0 ± 0% ~ (p=0.079 n=4+5) LockTable/groups=16,outstanding=16,read=2/-16 30.0 ± 0% 15.0 ± 0% -50.00% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=3/-16 24.0 ± 0% 11.0 ± 0% -54.17% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=4/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) LockTable/groups=16,outstanding=16,read=5/-16 9.00 ± 0% 4.00 ± 0% -55.56% (p=0.008 n=5+5) Release note: None
205d668
to
d3948f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/storage/concurrency/lock_table.go, line 218 at r1 (raw file):
The experiment that comes to mind is a workload with contended point reads (GetRequests) and ranged writes (DeleteRange). This workload will effectively test this fairness vs. throughput tradeoff.
Do we have an existing workload like this. If not, and I wanted to construct one, can you suggest some existing examples to read?
pkg/storage/concurrency/lock_table.go, line 357 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This should be fixed by #45423.
Thanks
pkg/storage/concurrency/lock_table.go, line 1488 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Oh, I see. That's really subtle.
I've added a comment.
pkg/storage/concurrency/lock_table.go, line 1523 at r1 (raw file):
Did removing this have an effect on the memory allocations in that benchmark?
Hmm, looking at the history of runs, it did not change the memory allocations.
bors r+ |
Build succeeded |
copy-on-write btree
The lockTableGuardImpl now keeps a snapshot of the relevant
btrees and waits on the locks there, which narrows the critical
section where treeMu.mu is held in read-mode.
Note that this introduces some harmless (from a correctness
perspective, and hopefully from a performance perspective)
anomalies where a request can wait on a stale lockState object.
These are documented in the code. The switch to this btree
did not eliminate the case where a reservation of a latch
holder is broken -- the "basic" test has been tweaked to show
how that can still happen.
Microbenchmarks are mostly better:
Release note: None