Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] storage/concurrency: introduce concurrency control package, prototype SFU #43775

Conversation

nvanbenschoten
Copy link
Member

Informs #41720.

The PR creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at concurrency_control.go and move out from there.

The new package has a few primary objectives:

  1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation.
  2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution.
  3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}) #6583).
  4. provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries.
  5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in storage: separated lock-table keyspace #41720.

Select For Update Prototype Results

The last commit in the PR contains a hacked version of upgrade locking hooked into the new concurrency package. This allows us to run several experiments to explore its effectiveness in providing its anticipated benefits.

Below is a collection of tests using YCSB. All experiments were run on a 3 VM m5d.4xlarge cluster with local SSDs. The tests used a hacked version of YCSB which contains an additional workload "U", which is similar to workload "A" except it performs UPDATES 100% of the time. A new request distribution called "single" was also added, which causes all requests to operate on a single row, though they are still spread evenly across all 10 column families in that row.

NOTE that these numbers compare master against a completely unoptimized implementation of an in-memory lock table. The implementation uses a single top-level mutex and makes no attempt to avoid memory allocations or optimize for common cases. There is some low hanging fruit here that would improve its performance measurably.

Throughput Under Contention

Improving the throughput of heavily contended workloads is a recurring goal. The combination of the concurrency package and SFU locking (implicit for UPDATE statements) promises to move towards this goal for heavily contended UPDATE workloads because it avoids thrashing and other wasted work.

Here we compare master against this prototype with three different YCSB variants:

- workload U, single distribution,  64 threads
- workload U, zipfian distribution, 64 threads
- workload A, zipfian distribution, 64 threads

Each variant was tested with each binary over a collection of 3 two-minute trials and the throughput recorded in each trial was averaged together.

throughput

We see in the U / single / 64 workload that SFU significantly improves the throughput of highly contended workloads, in this case by 49%. This is likely because of more efficient queuing and fewer transaction retries. The improvement goes away with less contention and only writes (U / zipf / 64), but comes back to some degree (5%) when reads are added into the mix (A / zipf / 64). My best theory for why this is is that upgrade locking improves fairness between readers and writers (see below), which helps prevent starvation of writers.

We'd also expect this to have a larger impact on bigger machines that can tolerate more contention in general.

Fairness Under Contention (i.e. Tail Latencies)

An issue identified early in the SFU work was that the current state of contention handling in Cockroach has serious fairness issues. There is very little queueing so transactions typically race to lay down intents as they repeatedly hit WriteTooOld errors or run into each other's intents. The concurrency package was meant to improve this situation, and SFU improves on it further by avoiding thrashing between reading and writing during an UPDATE operation.

We can see the effect of this by comparing the tail latencies when running YCSB against master and this prototype:

p95 latency

p99 latency

In this comparison, we see that tail latencies currently explode as contention increases (U / single / 64 is the most contended variant). However, the use of SFU avoids this explosion in tail latencies, maintaining a much tighter spread in transaction latencies across the workload variants. This is a clear demonstration of the effectiveness of this new approach.

Transaction Retries

The third goal of the SELECT FOR UPDATE work is to provide users with the ability to avoid transaction retries in certain scenarios where they're performing a read and then an update to the same row. This is what UPDATE statements do internally, and it turns out that today, UPDATE statements on their own can cause transaction retries (see the "Transaction retries in TPC-C" Google group thread) if they read a value that changes before they write to it. We can again explore this situation using YCSB.

In this experiment, we ran YCSB-A (A / zipf / 64) without any modifications and monitored the number of transaction retries. Because YCSB runs single-statement transactions, all of these would be retries on the SQL gateway, but these statements could just as easily be run in explicit transactions, in which case these retries would make it back to the client. For practical reasons, I started running with the new SFU prototype and then switched back to master half-way through the test (around 5:25). Take note of the KV Transaction Restart graph.

Screen Shot 2020-01-07 at 12 27 45 AM

We see essentially no transaction restarts when using the SFU-enhanced prototype. However, when we switch back to master, we see restarts jump up to around 140 per second across three classes of retry reasons. This confirms that in cases like these, SELECT FOR UPDATE will be an effective tool in reducing or eliminating transaction retries.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator

Exciting improvements here. How stable are these results? I ask because in recent experimentation with ycsb/A on Pebble, I noticed ± 10% throughput variation when running on master/RocksDB. I ended up needing to run each test 10-20 times, format the results in a manner compatible with benchstat and then use benchstat to tease out the delta. ycsb/A seemed more prone to this variation than some the kv0 and kv95 tests, presumably because of the contention (though I didn't verify that).

@nvanbenschoten
Copy link
Member Author

How stable are these results?

The throughput numbers across the trials were varying by about ± 2-3%. The latency percentiles were more stable across trials, which looks to be because they're quantized. I'm actually hopeful that the fairness improvements here will help reduce variance and make YCSB more stable.

// in the MVCC keyspace. This is performed transparently by a lockAwareIter.
//
// When the request completes, it exits any lockWaitQueues that it was at the
// head of and releases its latches. However, any locks that it inserted into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only on success, right? If the request tries to write an intent but this somehow fails below Raft, the lock should go away? Or I guess it's fine to have an in-mem lock too many, then we'll just push more than we need to (hope that doesn't confuse deadlock detection somehow). Hmm, SFU will basically mean having in-mem locks that aren't backed by replicated locks, so it will all just have to work anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only on success, right? If the request tries to write an intent but this somehow fails below Raft, the lock should go away?

Yes, documented.

Hmm, SFU will basically mean having in-mem locks that aren't backed by replicated locks, so it will all just have to work anyway.

Yeah, we'll have to make it work. For now, though, I'm not going to worry about inserting locks on failure paths.

// concurrency manager about the replicated write intent that was missing
// from its lock table. After doing so, it enqueues the request that hit the
// error in the lock's wait-queue (but does not wait) and releases the
// guard's latches. It returns an updated guard reflecting this change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When these methods return a new *Guard, does it mean that the original *Guard must no longer be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

HandleWriterIntentError(context.Context, *Guard, *roachpb.WriteIntentError) (*Guard, *Error)
// HandleTransactionPushError consumes a TransactionPushError by informing
// the concurrency manager about a transaction record that could not be
// pushes. After doing so, it releases the guard's latches. It returns an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// been updated. The argument indicates whether this manager's replica is
// the leaseholder going forward.
OnLeaseUpdated(bool)
// OnSplit informs the concurrency manager that its range has split of a new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// encountered.
ScanAndEnqueue(Request, *Guard) []lockWaitQueueGuard
// Dequeue removes the guard from its lock wait-queue.
Dequeue(lockWaitQueueGuard)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called if a waiter gives up? In normal operation we'll only ever dequeue in FIFO order, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this can be called if a waiter gives up, so we'll need to handle out of order dequeue-ing. Under normal operation though, you're correct. Commented.

// Dequeue removes the guard from its lock wait-queue.
Dequeue(lockWaitQueueGuard)
// SetBounds sets the key bounds of the lockTable.
SetBounds(start, end roachpb.RKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed right now, but it will be needed once we pull the replicated portion of the lockTable under here. I already had the wiring hooked up to the rest of the Storage package, which I think is valuable. I'll remove this for now though.

}

// Wait on each newly conflicting lock, if applicable.
if m.lwq.MustWaitOnAny(g.wqgs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be wqgs (which I know isn't in scope here)? Just trying to confirm my understanding that at the beginning of each iter in the for loop g.wqgs has been WaitOn'ed already and is only carried with the guard for .. other reasons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally that would have worked, but not anymore because in #43740 introduced the notion of "breaking" reservations. Essentially this means that waiters can lose their place at the front of a wait-queue while not holding latches. It is only safe to conclude that a waiter is at the front of all wait-queues if it is holding latches.

The reason for this is pretty subtle, but @sumeerbhola does a good job documenting it in that PR. Because we don't hold range locks while waiting in per-key wait-queues, we run into situations like the following. It's possible that two ranged writes txn1 and txn2 over keys [a, d) might sequence such that req txn1 initially acquires latches first. It finds a lock on key c and begins waiting. A different txn3 might then get sequenced and perform a point write on key a. When txn2 is sequenced it will enter the wait-queue for a (@ front) and c (behind txn1). When txn1 re-acquires latches and re-scans the lockTable, it will find the lock on a and enter its wait-queue. We don't want it getting behind txn2 in the queue or we'd cause a deadlock.

So, the solution Sumeer came up with to address this was to give each request a monotonically increasing sequence number when it first scans the lockTable. This creates a total order over requests. The order does not tell us exactly what order requests will finish being sequenced by the concurrency manager. However, it does tell us how to order conflicting transactions in a given wait-queue. In other words, the policy for entering a wait-queue is not that requests automatically get in the back. Instead, requests scan from the back to find where they fit in the sequence-order list of reqs. This avoids the kinds of deadlocks discussed above.

However, to get it to work, we need to abandon the idea that once a request is at the front of a wait-queue, it stays there. Instead, requests much check all of their wait-queue entries on each iteration through the loop while holding latches. A request is only safe to proceed when it observes that it is at the front of all of its wait-queues during a single pass.

The correctness of this is subtle but I think it's mentioned in #43740. A consequence of it is that as a req with a low seq num, we need to be very careful about never breaking the reservation of another request whose latch set might be compatible with its, because latching is what's providing the mutual exclusion to atomically determine that a request holds a reservation for (is at the front of) all wait-queues that it's a part of. The fact that non-locking (lock.None) reads are never going to hold reservations in wait-queues makes this easier to think through.

// want to wait of locks like (PutRequest and ScanRequest) and then there
// are those that don't want to wait on locks like (QueryIntentRequest,
// RefreshRequest, and ResolveIntentRequest). Should we define a flag for
// this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than trying to keep these "internal" request types side-by-side with the "public" KV API (Scan, Put, ...) we should establish a boundary between them. I feel that the flags have not been helping that, and they also tend to obfuscate understanding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also grown a little disillusioned about flags for something like this because they're defined so far away from where they're used. I'm curious what you're thinking in terms of a boundary.

switch arg.Method() {
case roachpb.HeartbeatTxn:
case roachpb.Refresh:
case roachpb.RefreshRange:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've expected to see ResolveIntent{,Range} here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat surprisingly, ResolveIntent{,Range} is not roachpb.IsTransactional. Neither of the requests have the isTxn flag.

// Wait on each new txnWaitQueue entry.
// TODO: should we do this here?
for _, wqg := range g.wqgs[orig:] {
if err := m.lwq.WaitOn(ctx, g.req, wqg); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case in which I find out about a lock only from the replicated key space, right? And there's the other case in which we discover locks directly from the lock table. I would've expected that we'd wait in SequenceReq in all cases, i.e. HandleWriterIntentError would be called and after the caller would pass the resulting guard to SequenceReq. (This would also explain why my assumption in that method about only having to wait on new wgqs is wrong).

Copy link
Member Author

@nvanbenschoten nvanbenschoten Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly how things are supposed to work. I think I just forgot to address this TODO and delete the code. Done.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jan 16, 2020
This commit introduces a new `concurrency/lock` package that provides
type definitions for locking-related concepts used by concurrency
control in the key-value layer. The package initially provides a
lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and
`Exclusive` and a lock `Durability` enumeration consisting of
`Unreplicated` and `Replicated`.

These definitions are currently unused but will be referenced soon
by cockroachdb#43740 and cockroachdb#43775. The changes around SELECT FOR UPDATE in the
optimizer (cockroachdb#44015) are also approaching the point where they'll
soon need the type definitions to interface with the KV API.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jan 17, 2020
This commit introduces a new `concurrency/lock` package that provides
type definitions for locking-related concepts used by concurrency
control in the key-value layer. The package initially provides a
lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and
`Exclusive` and a lock `Durability` enumeration consisting of
`Unreplicated` and `Replicated`.

These definitions are currently unused but will be referenced soon
by cockroachdb#43740 and cockroachdb#43775. The changes around SELECT FOR UPDATE in the
optimizer (cockroachdb#44015) are also approaching the point where they'll
soon need the type definitions to interface with the KV API.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 21, 2020
43862: storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC r=tbg a=ajwerner

This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in: `storage: rework RunGC so it no longer buffers keys and values in memory`.

1) Fix the spanset Iterator such that SeekLT can seek to the end of the range.
2) Expose reverse iteration in rditer.ReplicaDataIterator.
3) Rearrange some code in gc_queue.go to put the RunGC() helpers below that function.
4) Remove lockableGCInfo because there was no need for it.
5) Move RunGC into a `gc` subpackage.
5) Rework RunGC to paginate versions of keys with reverse iteration.


Fixes #42531.

44054: storage/concurrency/lock: define lock Strength and Durability modes r=nvanbenschoten a=nvanbenschoten

This commit introduces a new `concurrency/lock` package that provides type definitions for locking-related concepts used by concurrency control in the key-value layer. The package initially provides a lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and `Exclusive` and a lock `Durability` enumeration consisting of `Unreplicated` and `Replicated`.

These definitions are currently unused but will be referenced soon by #43740 and #43775. The changes around SELECT FOR UPDATE in the optimizer (#44015) are also approaching the point where they'll soon need the type definitions to interface with the KV API.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
…cher

Relates to cockroachdb#40205.

This commit is a follow-up to cockroachdb#44015. It propagates the row-level locking modes
from the optbuilder all the way down to the row.Fetcher. This requires passing
the locking information through the execbuilder and then through DistSQL. The
commit leaves off at the point where `roachpb.ScanRequest`s are constructed.

I couldn't find a good way to test this. It's possible I missed something and
would love any tips, but I think we may need to wait until the plumbing into
the KV API is complete before performing end-to-end tests.

The next step will be to hook the locking modes into the KV API.

Release note: None
…Result

This allows the handling of added, updated, and resolved intents to
become more reactive. It mirrors our handling of UpdatedTxns and their
interaction with the TxnWaitQueue.
…totype SFU

Informs cockroachdb#41720.

This commit creates a new concurrency package that provides a concurrency
manager structure that encapsulates the details of concurrency control and
contention handling for serializable key-value transactions. Any reader of this
commit should start at `concurrency_control.go` and move out from there.

The new package has a few primary objectives:
1. centralize the handling of request synchronization and transaction contention
   handling in a single location, allowing for the topic to be documented and
   understood in isolation.
2. rework contention handling to react to intent state transitions directly. This
   simplifies the transaction queueing story, reduces the frequency of transaction
   push RPCs, and allows waiters to proceed after intent resolution as soon as possible.
3. create a framework that naturally permits "update" locking, which is required for
   kv-level SELECT FOR UPDATE support (cockroachdb#6583).
4. provide stronger guarantees around fairness when transactions conflict, in order
   to reduce tail latencies under contended scenarios.
5. create a structure that can extend to address the long-term goals of a fully
   centralized lock-table laid out in cockroachdb#41720.

WARNING: this is still a WIP. Notably, the lockTableImpl is mocked out with a
working but incomplete implementation. See cockroachdb#43740 for a more complete strawman.

Release note: None
This commit hacks in upgrade locking into ScanRequests and uses it in YCSB.
@nvanbenschoten nvanbenschoten requested a review from a team as a code owner January 28, 2020 22:58
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 6, 2020
Informs cockroachdb#41720.

The commit creates a new concurrency package that provides a concurrency manager
structure that encapsulates the details of concurrency control and contention
handling for serializable key-value transactions. Interested readers should
start at `concurrency_control.go` and move out from there.

The new package has a few primary objectives:

1. centralize the handling of request synchronization and transaction contention
handling in a single location, allowing for the topic to be documented,
understood, and tested in isolation.

2. rework contention handling to react to intent state transitions directly.
This simplifies the transaction queueing story, reduces the frequency of
transaction push RPCs, and allows waiters to proceed immediately after intent
resolution.

3. create a framework that naturally permits "update" locking, which is required
for kv-level SELECT FOR UPDATE support (cockroachdb#6583).

4. provide stronger guarantees around fairness when transactions conflict, to
reduce tail latencies under contended sceneries.

5. create a structure that can extend to address the long-term goals of a fully
centralized lock-table laid out in cockroachdb#41720.

This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big
differences are that it updates the lockTable interface to match the one
introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest
of the concurrency control mechanisms in CockroachDB. At this point, a reader
of this file should get a good idea of how KV transactions interact in CRDB...
now we just need to make the system actually work this way.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 7, 2020
Informs cockroachdb#41720.

The commit creates a new concurrency package that provides a concurrency manager
structure that encapsulates the details of concurrency control and contention
handling for serializable key-value transactions. Interested readers should
start at `concurrency_control.go` and move out from there.

The new package has a few primary objectives:

1. centralize the handling of request synchronization and transaction contention
handling in a single location, allowing for the topic to be documented,
understood, and tested in isolation.

2. rework contention handling to react to intent state transitions directly.
This simplifies the transaction queueing story, reduces the frequency of
transaction push RPCs, and allows waiters to proceed immediately after intent
resolution.

3. create a framework that naturally permits "update" locking, which is required
for kv-level SELECT FOR UPDATE support (cockroachdb#6583).

4. provide stronger guarantees around fairness when transactions conflict, to
reduce tail latencies under contended sceneries.

5. create a structure that can extend to address the long-term goals of a fully
centralized lock-table laid out in cockroachdb#41720.

This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big
differences are that it updates the lockTable interface to match the one
introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest
of the concurrency control mechanisms in CockroachDB. At this point, a reader
of this file should get a good idea of how KV transactions interact in CRDB...
now we just need to make the system actually work this way.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 8, 2020
Informs cockroachdb#41720.

The commit creates a new concurrency package that provides a concurrency manager
structure that encapsulates the details of concurrency control and contention
handling for serializable key-value transactions. Interested readers should
start at `concurrency_control.go` and move out from there.

The new package has a few primary objectives:

1. centralize the handling of request synchronization and transaction contention
handling in a single location, allowing for the topic to be documented,
understood, and tested in isolation.

2. rework contention handling to react to intent state transitions directly.
This simplifies the transaction queueing story, reduces the frequency of
transaction push RPCs, and allows waiters to proceed immediately after intent
resolution.

3. create a framework that naturally permits "update" locking, which is required
for kv-level SELECT FOR UPDATE support (cockroachdb#6583).

4. provide stronger guarantees around fairness when transactions conflict, to
reduce tail latencies under contended sceneries.

5. create a structure that can extend to address the long-term goals of a fully
centralized lock-table laid out in cockroachdb#41720.

This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big
differences are that it updates the lockTable interface to match the one
introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest
of the concurrency control mechanisms in CockroachDB. At this point, a reader
of this file should get a good idea of how KV transactions interact in CRDB...
now we just need to make the system actually work this way.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 10, 2020
Informs cockroachdb#41720.

The commit creates a new concurrency package that provides a concurrency manager
structure that encapsulates the details of concurrency control and contention
handling for serializable key-value transactions. Interested readers should
start at `concurrency_control.go` and move out from there.

The new package has a few primary objectives:

1. centralize the handling of request synchronization and transaction contention
handling in a single location, allowing for the topic to be documented,
understood, and tested in isolation.

2. rework contention handling to react to intent state transitions directly.
This simplifies the transaction queueing story, reduces the frequency of
transaction push RPCs, and allows waiters to proceed immediately after intent
resolution.

3. create a framework that naturally permits "update" locking, which is required
for kv-level SELECT FOR UPDATE support (cockroachdb#6583).

4. provide stronger guarantees around fairness when transactions conflict, to
reduce tail latencies under contended sceneries.

5. create a structure that can extend to address the long-term goals of a fully
centralized lock-table laid out in cockroachdb#41720.

This commit pulls over a lot of already reviewed code from cockroachdb#43775. The big
differences are that it updates the lockTable interface to match the one
introduced in cockroachdb#43740 and it addresses the remaining TODOs to document the rest
of the concurrency control mechanisms in CockroachDB. At this point, a reader
of this file should get a good idea of how KV transactions interact in CRDB...
now we just need to make the system actually work this way.
craig bot pushed a commit that referenced this pull request Feb 10, 2020
44787: storage/concurrency: define concurrency control interfaces r=nvanbenschoten a=nvanbenschoten

Informs #41720.

The commit creates a new concurrency package that provides a concurrency manager structure that encapsulates the details of concurrency control and contention handling for serializable key-value transactions. Interested readers should start at `concurrency_control.go` and move out from there.

The new package has a few primary objectives:

1. centralize the handling of request synchronization and transaction contention handling in a single location, allowing for the topic to be documented, understood, and tested in isolation.
2. rework contention handling to react to intent state transitions directly. This simplifies the transaction queueing story, reduces the frequency of transaction push RPCs, and allows waiters to proceed immediately after intent resolution.
3. create a framework that naturally permits "update" locking, which is required for kv-level SELECT FOR UPDATE support (#6583).
4. provide stronger guarantees around fairness when transactions conflict, to reduce tail latencies under contended sceneries.
5. create a structure that can extend to address the long-term goals of a fully centralized lock-table laid out in #41720.

This commit pulls over a lot of already reviewed code from #43775. The big differences are that it updates the lockTable interface to match the one introduced in #43740 and it addresses the remaining TODOs to document the rest of the concurrency control mechanisms in CockroachDB. At this point, a reader of this file should get a good idea of how KV transactions interact in CRDB... now we just need to make the system actually work this way.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 18, 2020
…ments

This commit adds support for implicitly applying FOR UPDATE row-level
locking modes to the initial row scan of UPDATE statements.

Conceptually, if we picture an UPDATE statement as the composition of a
SELECT statement and an INSERT statement (with loosened semantics around
existing rows) then this change performs the following transformation:

```
UPDATE t = SELECT FROM t + INSERT INTO t
=>
UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t
```

The transformation is conditional on the UPDATE expression tree matching a
pattern. Specifically, the FOR UPDATE locking mode is only used during the
initial row scan when all row filters have been pushed into the ScanExpr. If
the statement includes any filters that cannot be pushed into the scan then
no row-level locking mode is applied. The rationale here is that FOR UPDATE
locking is not necessary for correctness due to serializable isolation, so it
is strictly a performance optimization for contended writes. Therefore, it is
not worth risking the transformation being a pessimization, so it is only
applied when doing so does not risk creating artificial contention.

The change introduces a new `enable_implicit_select_for_update` session variable
that controls whether this transformation is applied to mutation statements. It
also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting
to serve as the default value for the session variable.

The locking mode is still ignored by the key-value layer, but that will change
in the next few days. Once that happens, we'll be able to gather performance
numbers (past what's already been posted in cockroachdb#43775) about the performance impact
this change has on uncontended and contended workloads.

Release note (sql change): UPDATE statements now acquire locks using the FOR
UPDATE locking mode during their initial row scan, which improves performance
for contended workloads. This behavior is configurable using the
`enable_implicit_select_for_update` session variable and the
`sql.defaults.implicit_select_for_update.enabled` cluster setting.
craig bot pushed a commit that referenced this pull request Feb 19, 2020
45062: storage/concurrency: implement concurrency Manager r=nvanbenschoten a=nvanbenschoten

Informs #41720.
Informs #44976.
Initially drawn out in #43775.

This PR implements the concurrency.Manager interface, which is the core structure that ties together the new concurrency package. 

The concurrency manager is a structure that sequences incoming requests and provides isolation between requests that intend to perform conflicting operations. During sequencing, conflicts are discovered and any found are resolved through a combination of passive queuing and active pushing. Once a request has been sequenced, it is free to evaluate without concerns of conflicting with other in-flight requests due to the isolation provided by the manager. This isolation is guaranteed for the lifetime of the request but terminates once the request completes.

The manager accomplishes this by piecing together the following components in its request sequencing path:
- `latchManager`
- `lockTable`
- `lockTableWaiter`
- `txnWaitQueue`

The largest part of this change is introducing the datadriven testing framework to deterministically test the concurrency manager. This proved difficult for two reasons:
1. the concurrency manager composes several components to perform it work (latchManager, lockTable, lockTableWaiter, txnWaitQueue). It was difficult to get consistent observability into each of these components in such a way that tests could be run against a set of concurrent requests interacting with them all.
2. the concurrency manager exposes a primarily blocking interface. Requests call `Sequence()` and wait for sequencing to complete. This may block in a number of different places - while waiting on latches, while waiting on locks, and while waiting on other transactions. The most important part of these tests is to assert _where_ a given request blocks based on the current state of the concurrency manager and then assert _how_ the request reacts to a state transition by another request.

To address the first problem, the testing harness uses the context-carried tracing infrastructure to track the path of a request. We already had log events scattered throughout these various components, so this did not require digging testing hooks into each of them. Instead, the harness attached a trace recording span to each request and watches as events are added to the span. It then uses these events as the output of the text.

To address the second problem, the testing harness introduces a monitor object which manages a collection of "monitored" goroutines. The monitor watches as these goroutines run and keeps track of their goroutine state as is reported by a goroutine dump. During each step of the datadriven test, the monitor allows all goroutines to proceed until they have either terminated or stalled due to cross-goroutine synchronization dependencies. For instance, it waits for all goroutines to stall while receiving from channels. We can be sure that the goroutine dump provides a consistent snapshot of all goroutine states and statuses because `runtime.Stack(all=true)` stops the world when called. This means that when all monitored goroutines are simultaneously stalled, we have a deadlock that can only be resolved by proceeding forward with the test and releasing latches, resolving locks, or committing transactions. This structure worked surprisingly well and has held up to long periods of stressrace.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 19, 2020
…ments

This commit adds support for implicitly applying FOR UPDATE row-level
locking modes to the initial row scan of UPDATE statements.

Conceptually, if we picture an UPDATE statement as the composition of a
SELECT statement and an INSERT statement (with loosened semantics around
existing rows) then this change performs the following transformation:

```
UPDATE t = SELECT FROM t + INSERT INTO t
=>
UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t
```

The transformation is conditional on the UPDATE expression tree matching a
pattern. Specifically, the FOR UPDATE locking mode is only used during the
initial row scan when all row filters have been pushed into the ScanExpr. If
the statement includes any filters that cannot be pushed into the scan then
no row-level locking mode is applied. The rationale here is that FOR UPDATE
locking is not necessary for correctness due to serializable isolation, so it
is strictly a performance optimization for contended writes. Therefore, it is
not worth risking the transformation being a pessimization, so it is only
applied when doing so does not risk creating artificial contention.

The change introduces a new `enable_implicit_select_for_update` session variable
that controls whether this transformation is applied to mutation statements. It
also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting
to serve as the default value for the session variable.

The locking mode is still ignored by the key-value layer, but that will change
in the next few days. Once that happens, we'll be able to gather performance
numbers (past what's already been posted in cockroachdb#43775) about the performance impact
this change has on uncontended and contended workloads.

Release note (sql change): UPDATE statements now acquire locks using the FOR
UPDATE locking mode during their initial row scan, which improves performance
for contended workloads. This behavior is configurable using the
`enable_implicit_select_for_update` session variable and the
`sql.defaults.implicit_select_for_update.enabled` cluster setting.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 20, 2020
…ments

This commit adds support for implicitly applying FOR UPDATE row-level
locking modes to the initial row scan of UPDATE statements.

Conceptually, if we picture an UPDATE statement as the composition of a
SELECT statement and an INSERT statement (with loosened semantics around
existing rows) then this change performs the following transformation:

```
UPDATE t = SELECT FROM t + INSERT INTO t
=>
UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t
```

The transformation is conditional on the UPDATE expression tree matching a
pattern. Specifically, the FOR UPDATE locking mode is only used during the
initial row scan when all row filters have been pushed into the ScanExpr. If
the statement includes any filters that cannot be pushed into the scan then
no row-level locking mode is applied. The rationale here is that FOR UPDATE
locking is not necessary for correctness due to serializable isolation, so it
is strictly a performance optimization for contended writes. Therefore, it is
not worth risking the transformation being a pessimization, so it is only
applied when doing so does not risk creating artificial contention.

The change introduces a new `enable_implicit_select_for_update` session variable
that controls whether this transformation is applied to mutation statements. It
also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting
to serve as the default value for the session variable.

The locking mode is still ignored by the key-value layer, but that will change
in the next few days. Once that happens, we'll be able to gather performance
numbers (past what's already been posted in cockroachdb#43775) about the performance impact
this change has on uncontended and contended workloads.

Release note (sql change): UPDATE statements now acquire locks using the FOR
UPDATE locking mode during their initial row scan, which improves performance
for contended workloads. This behavior is configurable using the
`enable_implicit_select_for_update` session variable and the
`sql.defaults.implicit_select_for_update.enabled` cluster setting.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 26, 2020
…ments

This commit adds support for implicitly applying FOR UPDATE row-level
locking modes to the initial row scan of UPDATE statements.

Conceptually, if we picture an UPDATE statement as the composition of a
SELECT statement and an INSERT statement (with loosened semantics around
existing rows) then this change performs the following transformation:

```
UPDATE t = SELECT FROM t + INSERT INTO t
=>
UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t
```

The transformation is conditional on the UPDATE expression tree matching a
pattern. Specifically, the FOR UPDATE locking mode is only used during the
initial row scan when all row filters have been pushed into the ScanExpr. If
the statement includes any filters that cannot be pushed into the scan then
no row-level locking mode is applied. The rationale here is that FOR UPDATE
locking is not necessary for correctness due to serializable isolation, so it
is strictly a performance optimization for contended writes. Therefore, it is
not worth risking the transformation being a pessimization, so it is only
applied when doing so does not risk creating artificial contention.

The change introduces a new `enable_implicit_select_for_update` session variable
that controls whether this transformation is applied to mutation statements. It
also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting
to serve as the default value for the session variable.

The locking mode is still ignored by the key-value layer, but that will change
in the next few days. Once that happens, we'll be able to gather performance
numbers (past what's already been posted in cockroachdb#43775) about the performance impact
this change has on uncontended and contended workloads.

Release note (sql change): UPDATE statements now acquire locks using the FOR
UPDATE locking mode during their initial row scan, which improves performance
for contended workloads. This behavior is configurable using the
`enable_implicit_select_for_update` session variable and the
`sql.defaults.implicit_select_for_update.enabled` cluster setting.
craig bot pushed a commit that referenced this pull request Feb 26, 2020
45159: sql/opt/exec: add implicit SELECT FOR UPDATE support for UPDATE statements r=nvanbenschoten a=nvanbenschoten

This commit adds support for implicitly applying FOR UPDATE row-level locking modes to the initial row scan of UPDATE statements.

Conceptually, if we picture an UPDATE statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation:

```
UPDATE t = SELECT FROM t + INSERT INTO t
=>
UPDATE t = SELECT FROM t FOR UPDATE + INSERT INTO t
```

The transformation is conditional on the UPDATE expression tree matching a pattern. Specifically, the FOR UPDATE locking mode is only used during the initial row scan when all row filters have been pushed into the ScanExpr. If the statement includes any filters that cannot be pushed into the scan then no row-level locking mode is applied. The rationale here is that FOR UPDATE locking is not necessary for correctness due to serializable isolation, so it is strictly a performance optimization for contended writes. Therefore, it is not worth risking the transformation being a pessimization, so it is only applied when doing so does not risk creating artificial contention.

The change introduces a new `enable_implicit_select_for_update` session variable that controls whether this transformation is applied to mutation statements. It also introduces a `sql.defaults.implicit_select_for_update.enabled` cluster setting to serve as the default value for the session variable.

The locking mode is still ignored by the key-value layer, but that will change in the next few days. Once that happens, we'll be able to gather performance numbers (past what's already been posted in #43775) about the performance impact this change has on uncontended and contended workloads.

Release note (sql change): UPDATE statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads. This behavior is configurable using the  `enable_implicit_select_for_update` session variable and the `sql.defaults.implicit_select_for_update.enabled` cluster setting.

45459: sqlsmith: silence sqlsmith issue filing after query timeout r=yuzefovich a=rohany

Release note: None

45462: storage/engine: micro-optimize MVCCKeyCompare r=petermattis a=petermattis

`MVCCKeyCompare` is the single largest CPU usage function in a
`tpccbench` test. So give it a bit of micro-optimization love. While
we're here, move this function next to the rest of the Pebble code as
that is the main user.

name               old time/op  new time/op  delta
MVCCKeyCompare-16  15.8ns ± 4%   9.1ns ± 1%  -42.57%  (p=0.000 n=10+10)

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Peter Mattis <petermattis@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 4, 2020
Closes cockroachdb#40205.
Informs cockroachdb#41720.

This change teaches the KV client and the KV API about unreplicated locks.
It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which
allows their users to select the locking strength that they would like the
scan to use. This locking strength defaults to None, which corresponds to
the current behavior. However, some users will want to acquire locks on each
row scanned, which is now possible by setting the locking strength to a
stronger level. For now, only the Exclusive strength is supported.

The change then revisits SQL's row-level locking support, which is supported
all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit
(e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new
key-locking functionality in the KV API to hook up row-level locking,
completing the integration of SELECT FOR UPDATE with the KV layer and,
in particular, the new lock-table structure.

cockroachdb#43775 described the three main
benefits of this change:
- higher throughput under contention
- lower latency and improved fairness under contention
- a reduction in transaction retries under contention

I've revisited those results a few times in the last two months and seen that
the results continue to hold, and in some cases they have improved. I intend
to update this PR with a more complete analysis of its impact on those three
areas.

Release note (sql change): SELECT FOR UPDATE now hooks into a new
leaseholder-only locking mechanism. This allows the feature to be used
to improve performance of transactional that read, modify, and write
contended to rows. Similarly, UPDATE statements now use this new
mechanism by default, meaning that their performance under contention is
improved.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 4, 2020
Closes cockroachdb#40205.
Informs cockroachdb#41720.

This change teaches the KV client and the KV API about unreplicated locks.
It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which
allows their users to select the locking strength that they would like the
scan to use. This locking strength defaults to None, which corresponds to
the current behavior. However, some users will want to acquire locks on each
row scanned, which is now possible by setting the locking strength to a
stronger level. For now, only the Exclusive strength is supported.

The change then revisits SQL's row-level locking support, which is supported
all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit
(e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new
key-locking functionality in the KV API to hook up row-level locking,
completing the integration of SELECT FOR UPDATE with the KV layer and,
in particular, the new lock-table structure.

cockroachdb#43775 described the three main
benefits of this change:
- higher throughput under contention
- lower latency and improved fairness under contention
- a reduction in transaction retries under contention

I've revisited those results a few times in the last two months and seen that
the results continue to hold, and in some cases they have improved. I intend
to update this PR with a more complete analysis of its impact on those three
areas.

Release note (sql change): SELECT FOR UPDATE now hooks into a new
leaseholder-only locking mechanism. This allows the feature to be used
to improve performance of transactional that read, modify, and write
contended to rows. Similarly, UPDATE statements now use this new
mechanism by default, meaning that their performance under contention is
improved.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 6, 2020
Closes cockroachdb#40205.
Informs cockroachdb#41720.

This change teaches the KV client and the KV API about unreplicated locks.
It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which
allows their users to select the locking strength that they would like the
scan to use. This locking strength defaults to None, which corresponds to
the current behavior. However, some users will want to acquire locks on each
row scanned, which is now possible by setting the locking strength to a
stronger level. For now, only the Exclusive strength is supported.

The change then revisits SQL's row-level locking support, which is supported
all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit
(e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new
key-locking functionality in the KV API to hook up row-level locking,
completing the integration of SELECT FOR UPDATE with the KV layer and,
in particular, the new lock-table structure.

cockroachdb#43775 described the three main
benefits of this change:
- higher throughput under contention
- lower latency and improved fairness under contention
- a reduction in transaction retries under contention

I've revisited those results a few times in the last two months and seen that
the results continue to hold, and in some cases they have improved. I intend
to update this PR with a more complete analysis of its impact on those three
areas.

Release note (sql change): SELECT FOR UPDATE now hooks into a new
leaseholder-only locking mechanism. This allows the feature to be used
to improve performance of transactional that read, modify, and write
contended to rows. Similarly, UPDATE statements now use this new
mechanism by default, meaning that their performance under contention is
improved. This is only enabled for UPDATE statements that can push their
filter all the way into their key-value scan. To determine whether an
UPDATE statement is implicitly using SELECT FOR UPDATE locking, look
for a "locking strength" field in the EXPLAIN output for the statement.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 6, 2020
Closes cockroachdb#40205.
Informs cockroachdb#41720.

This change teaches the KV client and the KV API about unreplicated locks.
It then adds a KeyLocking mode to ScanRequest and ReverseScanRequest, which
allows their users to select the locking strength that they would like the
scan to use. This locking strength defaults to None, which corresponds to
the current behavior. However, some users will want to acquire locks on each
row scanned, which is now possible by setting the locking strength to a
stronger level. For now, only the Exclusive strength is supported.

The change then revisits SQL's row-level locking support, which is supported
all the way down to the row fetcher for implicit (e.g. UPDATE) and explicit
(e.g. SELECT ... FOR UPDATE) upgrade locking. The change uses the new
key-locking functionality in the KV API to hook up row-level locking,
completing the integration of SELECT FOR UPDATE with the KV layer and,
in particular, the new lock-table structure.

cockroachdb#43775 described the three main
benefits of this change:
- higher throughput under contention
- lower latency and improved fairness under contention
- a reduction in transaction retries under contention

I've revisited those results a few times in the last two months and seen that
the results continue to hold, and in some cases they have improved. I intend
to update this PR with a more complete analysis of its impact on those three
areas.

Release note (sql change): SELECT FOR UPDATE now hooks into a new
leaseholder-only locking mechanism. This allows the feature to be used
to improve performance of transactions that read, modify, and write
contended to rows. Similarly, UPDATE statements now use this new
mechanism by default, meaning that their performance under contention is
improved. This is only enabled for UPDATE statements that can push their
filter all the way into their key-value scan. To determine whether an
UPDATE statement is implicitly using SELECT FOR UPDATE locking, look
for a "locking strength" field in the EXPLAIN output for the statement.
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/selectForUpdate2 branch March 16, 2020 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants