Document new approach to locking in KV layer #7020

rmloveland · 2020-03-30T19:26:42Z

Summary of changes (all to 'Transaction Layer' page):

Update 'Writes and reads' section to note that there are now two
types of locks: write intents, and unreplicated locks managed by the
concurrency manager in a per-node lock table (about which more below).
Update 'Write intents' section with information about new concurrency
control mechanism using unreplicated lock table.
Add new 'Concurrency control' section describing concurrency manager
and lock table at a high level, with links to SELECT FOR UPDATE
docs. This is followed by more detailed subsections that are mostly
adapted from the in-code comments in
pkg/kv/kvserver/concurrency/concurrency_control.go.

cockroach-teamcity · 2020-03-30T19:26:48Z

This change is

cockroach-teamcity · 2020-03-30T19:43:25Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/d654d54518956281511bcb44c946dc54e2b3a830/

Edited pages:

v20.1/architecture/transaction-layer.md

cockroach-teamcity · 2020-03-30T20:05:24Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/d9d0aec79926c592bdfb65d82c92f51f5a8a3298/

Edited pages:

v20.1/architecture/transaction-layer.md

nvanbenschoten

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)

v20.1/architecture/transaction-layer.md, line 33 at r1 (raw file):

  - **Unreplicated Locks** which are stored in an in-memory, per-node lock table by the [concurrency control](#concurrency-control) machinery. These locks are not replicated via [Raft](replication-layer.html#raft).

  - **Write intents**, which are replicated via [Raft](replication-layer.html#raft), and act as exclusive locks. They are essentially the same as standard [multi-version concurrency control (MVCC)](storage-layer.html#mvcc) values but also contain a pointer to the [transaction record](#transaction-records) stored on the cluster.

I think we should use the word "provisional value" somewhere in here. We've shifted towards thinking of a write intent as a combination of a replicated lock and a replicated provisional value.

v20.1/architecture/transaction-layer.md, line 144 at r1 (raw file):

### Write intents

Values in CockroachDB are not written directly to the storage layer; instead values are written in a provisional state known as a "write intent." These are essentially MVCC records with an additional value added to them which identifies the transaction record to which the value belongs.They can be thought of as replicated, exclusive locks.

Missing a space bettween "belongs." and "They"

v20.1/architecture/transaction-layer.md, line 148 at r1 (raw file):

Whenever an operation encounters a write intent (instead of an MVCC value), it looks up the status of the transaction record to understand how it should treat the write intent value. If the transaction record is missing, the operation checks the write intent's timestamp and evaluates whether or not it is considered expired.

<span class="version-tag">New in v20.1:</span> CockroachDB manages concurrency control using a per-node, in-memory lock table which holds a collection of locks acquired by in-progress transactions, and incorporates information about write intents as they are discovered during evaluation. For more information, see the section below on [Concurrency control](#concurrency-control).

Very well said.

v20.1/architecture/transaction-layer.md, line 160 at r1 (raw file):

- _Record does not exist_: If the write intent was created within the transaction liveness threshold, it's the same as `PENDING`, otherwise it's treated as `ABORTED`.

### Concurrency control

The concurrency manager is now in charge of the latch manager, so we should pull the latch manager section down here. The way to think of this is that, on its own, the latch manager sequences incoming requests and provides isolation between those requests (what you said here). The concurrency manager combines the latch manager and the lock table to sequence incoming requests and provides isolation between the transactions in those requests.

v20.1/architecture/transaction-layer.md, line 179 at r1 (raw file):

Transactions require isolation both within requests and across requests. The manager accommodates this by allowing transactional requests to acquire locks, which outlive the requests themselves. Locks extend the duration of the isolation provided over specific keys to the lifetime of the lock-holder transaction itself. They are (typically) only released when the transaction commits or aborts. Other requests that find these locks while being sequenced wait on them to be released in a queue before proceeding. Because locks are checked during sequencing, requests are guaranteed access to all declared keys after they have been sequenced. In other words, locks don't need to be checked again during evaluation.

However, at the time of writing, not all locks are stored directly under the manager's control, so not all locks are discoverable during sequencing. Specifically, write intents (replicated, exclusive locks) are stored inline in the MVCC keyspace, so they are not detectable until request evaluation time. To accommodate this form of lock storage, the manager integrates information about external locks with the concurrency manager structure.

Does it make sense to say "at the time of writing" in technical documentation, as opposed to code? I'm not saying this is wrong, just that I'm not sure.

v20.1/architecture/transaction-layer.md, line 193 at r1 (raw file):

The database is read and written using "requests". Transactions are composed of one or more requests. Isolation is needed across requests. Additionally, since transactions represent a group of requests, isolation is needed across such groups. Part of this isolation is accomplished by maintaining multiple versions and part by allowing requests to acquire locks. Even the isolation based on multiple versions requires some form of mutual exclusion to ensure that a read and a conflicting lock acquisition do not happen concurrently. The lock table provides both locking and sequencing of requests (in concert with the use of latches). The lock table sequences both transactional and non-transactional requests, but the latter cannot acquire locks.

Locks outlive the requests themselves and thereby extend the duration of the isolation provided over specific keys to the lifetime of the lock-holder transaction itself. They are (typically) only released when the transaction commits or aborts. Other requests that find these locks while being sequenced wait on them to be released in a queue before proceeding. Because locks are checked during sequencing, requests are guaranteed access to all declared keys after they have been sequenced. In other words, locks don't need to be checked again during evaluation.

"declared keys" seems like an implementation detail to me.

irfansharif

Left a few (unsolicited) comments, to help my own understanding mostly.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rmloveland)

v20.1/architecture/transaction-layer.md, line 17 at r1 (raw file):

Above all else, CockroachDB believes consistency is the most important feature of a database––without it, developers cannot build reliable tools, and businesses suffer from potentially subtle and hard to detect anomalies.

To provide consistency, CockroachDB implements full support for ACID transaction semantics in the transaction layer. However, it's important to realize that *all* statements are handled as transactions, including single statements––this is sometimes referred to as "autocommit mode" because it behaves as if every statement is followed by a `COMMIT`.

Not this change but I believe it's clearer to describe these as "implicit txns". That's actually the phrase we use internally, and I believe the phrase that's reflected in the Statements page.

v20.1/architecture/transaction-layer.md, line 33 at r1 (raw file):

  - **Unreplicated Locks** which are stored in an in-memory, per-node lock table by the [concurrency control](#concurrency-control) machinery. These locks are not replicated via [Raft](replication-layer.html#raft).

  - **Write intents**, which are replicated via [Raft](replication-layer.html#raft), and act as exclusive locks. They are essentially the same as standard [multi-version concurrency control (MVCC)](storage-layer.html#mvcc) values but also contain a pointer to the [transaction record](#transaction-records) stored on the cluster.

Maybe section this off as "Replicated Locks (aka Write Intents) (which also happen to be replicated provisional values)"?

v20.1/architecture/transaction-layer.md, line 37 at r1 (raw file):

- A **transaction record** stored in the range where the first write occurs, which includes the transaction's current state (which is either `PENDING`, `STAGING`, `COMMITTED`, or `ABORTED`).

As write intents are created, CockroachDB checks for newer committed values––if they exist, the transaction is restarted––and existing write intents for the same keys––which is resolved as a [transaction conflict](#transaction-conflicts).

Something is a bit off with this sentence structure.

v20.1/architecture/transaction-layer.md, line 43 at r1 (raw file):

#### Reading

If the transaction has not been aborted, the transaction layer begins executing read operations. If a read only encounters standard MVCC values, everything is fine. However, if it encounters any write intents, the operation must be resolved as a [transaction conflict](#transaction-conflicts).

This reads like we're executing "reads" after "writes" (the preceding section), which is not the case.

v20.1/architecture/transaction-layer.md, line 47 at r1 (raw file):

### Commits (phase 2)

CockroachDB checks the running transaction's record to see if it's been `ABORTED`; if it has, it restarts the transaction.

It could have to error out to the client, needing a client-side retry, instead of an internal retry. Also, after a txn is done with all its reads and writes, it doesn't have to recheck it's txn status to see if it has been aborted. If it had been aborted, it'd have discovered as much during the reads/writes.

v20.1/architecture/transaction-layer.md, line 51 at r1 (raw file):

In the common case, it sets the transaction record's state to `STAGING`, and checks the transaction's pending write intents to see if they have succeeded (i.e., been replicated across the cluster).

If the transaction passes these checks, CockroachDB responds with the transaction's success to the client, and moves on to the cleanup phase. At this point, the transaction is committed, and the client is free to begin sending more requests to the cluster.

I think it's clearer to say the client can be informed of the commit, instead of saying the client is free to issue further requests (through further txns).

v20.1/architecture/transaction-layer.md, line 61 at r1 (raw file):

- Moves the state of the transaction record from `STAGING` to `COMMITTED`.
- Resolves the transaction's write intents to MVCC values by removing the element that points it to the transaction record.
- Deletes the write intents.

I think this last part is redundant with the above. We only ever "resolve" the write intent, we don't delete them outright seeing as how they're the provisional values (and now no longer provisional)

v20.1/architecture/transaction-layer.md, line 63 at r1 (raw file):

- Deletes the write intents.

This is simply an optimization, though. If operations in the future encounter write intents, they always check their transaction records––any operation can resolve or remove write intents by checking the transaction record's status.

Ambiguous "their", I think. We're checking the txn responsible for the write intent we observed.

v20.1/architecture/transaction-layer.md, line 118 at r1 (raw file):

To enforce this serialization, the leaseholder creates a "latch" for the keys in the write value, providing uncontested access to the keys. If other operations come into the leaseholder for the same set of keys, they must wait for the latch to be released before they can proceed.

Of note, only write operations generate a latch for the keys. Read operations do not block other operations from executing.

Is this right, @nvanbenschoten? We certainly have read latches, and they do block other operations/requests from executing.

v20.1/architecture/transaction-layer.md, line 138 at r1 (raw file):

- `STAGING`: Used to enable the [Parallel Commits](#parallel-commits) feature. Depending on the state of the write intents referenced by this record, the transaction may or may not be in a committed state.
- `ABORTED`: Indicates that the transaction was aborted and its values should be discarded.
- _Record does not exist_: If a transaction encounters a write intent whose transaction record doesn't exist, it uses the write intent's timestamp to determine how to proceed. If the write intent's timestamp is within the transaction liveness threshold, the write intent's transaction is treated as if it is `PENDING`, otherwise it's treated as if the transaction is `ABORTED`.

Probably not worth mentioning but just FYI, the reason a txn record is found to not exist is because we're lazy about creating txn records. For short txns we try to immediately go to the STAGING state, instead of first writing out and explicit PENDING state.

v20.1/architecture/transaction-layer.md, line 160 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The concurrency manager is now in charge of the latch manager, so we should pull the latch manager section down here. The way to think of this is that, on its own, the latch manager sequences incoming requests and provides isolation between those requests (what you said here). The concurrency manager combines the latch manager and the lock table to sequence incoming requests and provides isolation between the transactions in those requests.

Ah, I understand it now as well. Thanks!

We could rephrase the use of "operations" in the latch manager section to use the terminology used here, namely "requests".

v20.1/architecture/transaction-layer.md, line 177 at r1 (raw file):

<span class="version-tag">New in v20.1:</span> The concurrency manager is a structure that sequences incoming requests and provides isolation between requests that intend to perform conflicting operations. During sequencing, conflicts are discovered and any found are resolved through a combination of passive queuing and active pushing. Once a request has been sequenced, it is free to evaluate without concerns of conflicting with other in-flight requests due to the isolation provided by the manager. This isolation is guaranteed for the lifetime of the request but terminates once the request completes.

Transactions require isolation both within requests and across requests. The manager accommodates this by allowing transactional requests to acquire locks, which outlive the requests themselves. Locks extend the duration of the isolation provided over specific keys to the lifetime of the lock-holder transaction itself. They are (typically) only released when the transaction commits or aborts. Other requests that find these locks while being sequenced wait on them to be released in a queue before proceeding. Because locks are checked during sequencing, requests are guaranteed access to all declared keys after they have been sequenced. In other words, locks don't need to be checked again during evaluation.

See comment below regarding "isolation within requests". I'm finding the first sentence a bit hard to understand.

v20.1/architecture/transaction-layer.md, line 191 at r1 (raw file):

<span class="version-tag">New in v20.1:</span> The lock table is a per-node, in-memory data structure that holds a collection of locks acquired by in-progress transactions. Each lock in the table has a possibly-empty lock wait-queue associated with it, where conflicting transactions can queue while waiting for the lock to be released.  Items in the locally stored lock wait-queue are propagated as necessary (via RPC) to the existing [`TxnWaitQueue`](#txnwaitqueue), which is stored on the leader of the range's Raft group that contains the [transaction record](#transaction-records).

The database is read and written using "requests". Transactions are composed of one or more requests. Isolation is needed across requests. Additionally, since transactions represent a group of requests, isolation is needed across such groups. Part of this isolation is accomplished by maintaining multiple versions and part by allowing requests to acquire locks. Even the isolation based on multiple versions requires some form of mutual exclusion to ensure that a read and a conflicting lock acquisition do not happen concurrently. The lock table provides both locking and sequencing of requests (in concert with the use of latches). The lock table sequences both transactional and non-transactional requests, but the latter cannot acquire locks.

I may be wrong here, but shouldn't it be "Isolation is needed within requests. Additionally [...]" (borrowing from phrasing earlier in the page)? I'm not sure how isolation across groups of requests is any different from isolation across requests. Come to think of it, what does isolation within a request even mean?

v20.1/architecture/transaction-layer.md, line 196 at r1 (raw file):

are encountered during evaluation.

I believe we have to resequence them if we discover new locks. @nvanbenschoten?

irfansharif

Didn't mean to block (Request Changes) the PR.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rmloveland)

irfansharif

I'll mark this as approved here in light of #7057.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rmloveland)

rmloveland

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @rmloveland)

v20.1/architecture/transaction-layer.md, line 17 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Not this change but I believe it's clearer to describe these as "implicit txns". That's actually the phrase we use internally, and I believe the phrase that's reflected in the Statements page.

Thanks for these comments, Irfan. I've filed #7057 to capture this and other comments that are about this page but not necessarily this PR so we can make these improvements.

v20.1/architecture/transaction-layer.md, line 33 at r1 (raw file):
Updated to say

which are replicated via Raft, and act as a combination of a provisional value and an exclusive lock

Prompted by this, I have also updated the description in the 'Write intents' section to say

They can be thought of as a combination of a replicated lock and a replicated provisional value.

v20.1/architecture/transaction-layer.md, line 33 at r1 (raw file):
Good idea. Updated to

Replicated Locks (also known as write intents), which are ...

v20.1/architecture/transaction-layer.md, line 138 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Probably not worth mentioning but just FYI, the reason a txn record is found to not exist is because we're lazy about creating txn records. For short txns we try to immediately go to the STAGING state, instead of first writing out and explicit PENDING state.

Yes we should consider updating this to be a bit more explicit in this section. Right now I think that info is entirely in the section on parallel commits. Filed #7056 to capture

v20.1/architecture/transaction-layer.md, line 144 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Missing a space bettween "belongs." and "They"

Fixed. Also updated this description to use the "provisional value" language.

v20.1/architecture/transaction-layer.md, line 148 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Very well said.

Thank you!

v20.1/architecture/transaction-layer.md, line 160 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Ah, I understand it now as well. Thanks!

We could rephrase the use of "operations" in the latch manager section to use the terminology used here, namely "requests".

Thanks, that makes things clearer!

Revised this section a bit as follows:

Added some info about the latch manager to the 'Concurrency control' section's overview. Nathan, I tried to make clear the distinction you shared re: sequencing requests vs. sequencing transactions in the requests.
Pulled in 'Latch manager' section, and added a little intro saying what it does, and that it's operating in the context of the concurrency manager. Irfan, as you requested I changed most uses of 'operation' to 'request' for consistency with terms elsewhere here, with minor wording tweaks.

v20.1/architecture/transaction-layer.md, line 179 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does it make sense to say "at the time of writing" in technical documentation, as opposed to code? I'm not saying this is wrong, just that I'm not sure.

No it isn't really necessary in this context. Deleted it, thanks for the mention.

v20.1/architecture/transaction-layer.md, line 191 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I may be wrong here, but shouldn't it be "Isolation is needed within requests. Additionally [...]" (borrowing from phrasing earlier in the page)? I'm not sure how isolation across groups of requests is any different from isolation across requests. Come to think of it, what does isolation within a request even mean?

Re: how isolation across requests vs. groups of requests, I think "request" = "1 op (maybe transactional, maybe not)" and "group of requests" = "multiple ops (implying a transaction)". At least that's the way I've been reading this.

Re: "isolation within" vs. "isolation across", I'm not sure. Maybe they are intended to mean the same thing?

That said, I'm not at all sure about either of the above. Nathan, please advise :-)

v20.1/architecture/transaction-layer.md, line 193 at r1 (raw file):
Makes sense. Combined the 2 last sentences to keep it a bit higher level:

Because locks are checked during sequencing, locks don't need to be checked again during evaluation.

cockroach-teamcity · 2020-04-02T16:20:33Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/4defb19109f3612f215364a5aeaf4fe281ea1b2a/

Edited pages:

v20.1/architecture/transaction-layer.md

irfansharif

Reviewed 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @rmloveland)

v20.1/architecture/transaction-layer.md, line 17 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Thanks for these comments, Irfan. I've filed #7057 to capture this and other comments that are about this page but not necessarily this PR so we can make these improvements.

Thanks! Sorry for having added them here, I was trying to learn about this lock table stuff and was reading back and forth while doing so.

v20.1/architecture/transaction-layer.md, line 118 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Is this right, @nvanbenschoten? We certainly have read latches, and they do block other operations/requests from executing.

This text was moved below, but I believe the point still applies.

v20.1/architecture/transaction-layer.md, line 157 at r2 (raw file):

- The *latch manager* sequences the incoming requests and provides isolation between those requests.
- The *lock table* provides both locking and sequencing of requests (in concert with the latch manager). The lock table sequences both transactional and non-transactional requests.  It is a per-node, in-memory data structure that holds a collection of locks acquired by in-progress transactions. To ensure compatibility with the existing system of [write intents](#write-intents) (a.k.a. replicated, exclusive locks), it pulls in information about these external locks as necessary when they are discovered in the course of evaluating requests.

Does the lock table sequence non-transactional requests, @nvanbenschoten? Or is the sequencing between transactional and non-transactional requests happening at the latch manager level? My (likely incorrect) current understanding of it is "latch manager is the thing for requests" and "lock table is the thing for transactions, which are comprised of requests".

Circling back to "does the lock table sequence non-transactional requests", if so, in a couple of spots in the newly added text we exclusively reference "transactions" (like the third sentence in this paragraph), which is slightly confusing (to me at least).

v20.1/architecture/transaction-layer.md, line 169 at r2 (raw file):

#### Concurrency manager

<span class="version-tag">New in v20.1:</span> The concurrency manager is a structure that sequences incoming requests and provides isolation between the transactions in those requests that intend to perform conflicting operations. During sequencing, conflicts are discovered and any found are resolved through a combination of passive queuing and active pushing. Once a request has been sequenced, it is free to evaluate without concerns of conflicting with other in-flight requests due to the isolation provided by the manager. This isolation is guaranteed for the lifetime of the request but terminates once the request completes.

I don't have a better suggestion here but "transactions in those requests" muddles the parent-child relationship between transactions and requests, I think.

v20.1/architecture/transaction-layer.md, line 169 at r2 (raw file):

active pushing

I know what we mean by active pushing here, but is this something that has been defined elsewhere? Our use of the word "push" internally wrt to txns is something I was always confused by, is it a commonly understood term? Perhaps this is one more thing to add to #7057.

rmloveland

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @rmloveland)

v20.1/architecture/transaction-layer.md, line 17 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Thanks! Sorry for having added them here, I was trying to learn about this lock table stuff and was reading back and forth while doing so.

No problem, I appreciate any commentary / feedback that will help us make better docs. :-)

v20.1/architecture/transaction-layer.md, line 118 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This text was moved below, but I believe the point still applies.

Noted, thanks! I am waiting to hear Nathan's response to your question, and will update accordingly. (Not because I don't believe you, of course - just want to get buy-in from the primary reviewer as well)

v20.1/architecture/transaction-layer.md, line 169 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

active pushing

I know what we mean by active pushing here, but is this something that has been defined elsewhere? Our use of the word "push" internally wrt to txns is something I was always confused by, is it a commonly understood term? Perhaps this is one more thing to add to #7057.

I don't know if this is a commonly used term in the world of distirbuted systems in general, or not.
At the very least, I think we should update our architecture docs to define it more explicitly, so we can reference it from usages like this.
I've filed a separate ticket #7087 to capture this separately, since I think it's fairly important. Please add a comment there if you have more to say on this.

v20.1/architecture/transaction-layer.md, line 169 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I don't have a better suggestion here but "transactions in those requests" muddles the parent-child relationship between transactions and requests, I think.

Yeah I'm not sure I love it either, was trying to capture the nuance but it could probably be better. Nathan, do you have an opinion on this phrasing / a better one?

rmloveland · 2020-04-09T19:20:30Z

@nvanbenschoten I know there are some lingering questions about various details on this PR.

Leaving those aside for a moment: do you have any additional changes you'd like to see done? I think (?) your initial comments have been addressed in the latest round of changes. (But please let me know if not and I will work on addressing them.)

If you are OK with it, I'd love to pass this along for the Docs team to review so we have it up for the release.

nvanbenschoten

mod the read latch discussion. Other than that, I think it's ready for the Docs team to review. Thanks Rich.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @rmloveland)

v20.1/architecture/transaction-layer.md, line 17 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

No problem, I appreciate any commentary / feedback that will help us make better docs. :-)

@irfansharif a lot of this lines up with the documentation in concurrency_control.go, so you can also feel free to update the code comments.

v20.1/architecture/transaction-layer.md, line 118 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Noted, thanks! I am waiting to hear Nathan's response to your question, and will update accordingly. (Not because I don't believe you, of course - just want to get buy-in from the primary reviewer as well)

Yes, Irfan is correct. We do have latches for read operations, and these work as you would expect (i.e. multiple read latches over the same keys can be held concurrently, but a read latch and a write latch over the same keys cannot).

v20.1/architecture/transaction-layer.md, line 191 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Re: how isolation across requests vs. groups of requests, I think "request" = "1 op (maybe transactional, maybe not)" and "group of requests" = "multiple ops (implying a transaction)". At least that's the way I've been reading this.

Re: "isolation within" vs. "isolation across", I'm not sure. Maybe they are intended to mean the same thing?

That said, I'm not at all sure about either of the above. Nathan, please advise :-)

I think this was intending to mean that a request in a transaction should be isolated from other transactions both during its lifetime and after it has completed (assuming it acquired locks) but within the transaction's lifetime. I agree that the use of the word "within" is confusing.

v20.1/architecture/transaction-layer.md, line 196 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

are encountered during evaluation.
I believe we have to resequence them if we discover new locks. @nvanbenschoten?

Yes, this is correct. I'll defer to Rich about whether this needs to be included in the doc or not.

To be honest, I don't know what the exact purpose of this doc is, so I'm having trouble with a lot of these suggestions. What is the goal here? To help readers understand the building blocks that make up CRDB's architecture or to help them understand the fine-grained implementation details such that they know enough to make open-source contributions if they so desire? What we have written up here so far seems to straddle the line between these two goals.

v20.1/architecture/transaction-layer.md, line 157 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Does the lock table sequence non-transactional requests, @nvanbenschoten? Or is the sequencing between transactional and non-transactional requests happening at the latch manager level? My (likely incorrect) current understanding of it is "latch manager is the thing for requests" and "lock table is the thing for transactions, which are comprised of requests".

Circling back to "does the lock table sequence non-transactional requests", if so, in a couple of spots in the newly added text we exclusively reference "transactions" (like the third sentence in this paragraph), which is slightly confusing (to me at least).

Kind of. Non-transactional requests don't acquire locks themselves, but they do still wait on other transaction's locks, so they still consult the lock table.

I don't think we should mention non-transactional requests anywhere in this document though. They're not important to someone trying to understand Cockroach and they can just as easily be thought of as transaction requests operating on only a single range.

v20.1/architecture/transaction-layer.md, line 169 at r2 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Yeah I'm not sure I love it either, was trying to capture the nuance but it could probably be better. Nathan, do you have an opinion on this phrasing / a better one?

I would flip the "in" with something like "which issued" to re-establish the parent-child relationship.

rmloveland

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @rmloveland)

v20.1/architecture/transaction-layer.md, line 118 at r1 (raw file):
Thanks - updated with the following (striving for minimal edit):

Read requests also generate latches. Multiple read latches over the same keys can be held concurrently, but a read latch and a write latch over the same keys cannot.

v20.1/architecture/transaction-layer.md, line 191 at r1 (raw file):
I have updated this section to use a variant of your more precise comment here, namely:

Each request in a transaction should be isolated from other requests, both during the request's lifetime, and after the request has completed (assuming it acquired locks) but within the surrounding transaction's lifetime.

v20.1/architecture/transaction-layer.md, line 196 at r1 (raw file):
I'll hold off on adding that, I guess one could argue that's part of "adding the information about them as they are encountered" (?).

What is the goal here? To help readers understand the building blocks that make up CRDB's architecture or to help them understand the fine-grained implementation details such that they know enough to make open-source contributions if they so desire? What we have written up here so far seems to straddle the line between these two goals.

Re: the goal (from my POV at least), it's ideally the former, in practice probably somewhere between the two, but in this case ended up being weighted towards detail. I used the code comments as the bulk of the subsections, which may have been an error, in that it pulled us further along towards providing detail.

However I think some detailed questions arise when you start to ask why do we need a concurrency manager and locks in addition to write intents? why is the lock table not replicated when intents are? and so forth. Tough to find the right balance.

v20.1/architecture/transaction-layer.md, line 157 at r2 (raw file):

I don't think we should mention non-transactional requests anywhere in this document though. They're not important to someone trying to understand Cockroach

I have removed mention of non-transactional requests

v20.1/architecture/transaction-layer.md, line 169 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I would flip the "in" with something like "which issued" to re-establish the parent-child relationship.

Thanks - have updated both usages of "transactions in those requests" to "transactions which issued those requests".

cockroach-teamcity · 2020-04-13T15:58:43Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/62a37da84db02b992f3ece3b0f99c88dc517bf16/

Edited pages:

v20.1/architecture/transaction-layer.md

nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @lnhsingh, and @rmloveland)

v20.1/architecture/transaction-layer.md, line 169 at r2 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Thanks - have updated both usages of "transactions in those requests" to "transactions which issued those requests".

Much better.

lnhsingh

Overall, LGTM! I had a couple of nits with punctuation, etc. and a few terms I thought should be clarified (but if you think these terms are common enough, please ignore my suggestions)

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @rmloveland)

v20.1/architecture/transaction-layer.md, line 31 at r3 (raw file):

- **Locks** for all of a transaction’s writes, which represent a provisional, uncommitted state. CockroachDB has several different types of locking:

  - **Unreplicated Locks** which are stored in an in-memory, per-node lock table by the [concurrency control](#concurrency-control) machinery. These locks are not replicated via [Raft](replication-layer.html#raft).

Suggestion: I think you can tighten up the language by putting "Unreplicated locks are stored in an ..." and "Replicated Locks (aka write intents) are replicated via Raft..."

(remove the "which")

v20.1/architecture/transaction-layer.md, line 138 at r3 (raw file):
Suggestion: Break the first sentence into two sentences. Something like:

CockroachDB manages concurrency control using a per-node, in-memory lock table. This table holds a collection of locks acquired by in-progress transactions and incorporates information about write intents as they are discovered during evaluation.

v20.1/architecture/transaction-layer.md, line 152 at r3 (raw file):

### Concurrency control

<span class="version-tag">New in v20.1:</span> The *concurrency manager* sequences incoming requests and provides isolation between the transactions which issued those requests that intend to perform conflicting operations. This activity is also known as [concurrency control](https://en.wikipedia.org/wiki/Concurrency_control).

I think it should be "provides isolation between the transactions that issued those requests.." instead of which

v20.1/architecture/transaction-layer.md, line 171 at r3 (raw file):
I think the second comma should be moved:

Each request in a transaction should be isolated from other requests, both during the request's lifetime and after the request has completed (assuming it acquired locks), but within the surrounding transaction's lifetime.

v20.1/architecture/transaction-layer.md, line 181 at r3 (raw file):

{{site.data.alerts.end}}

Fairness is ensured between requests. In general, if any two requests conflict then the request that arrived first will be sequenced first. As such, sequencing guarantees FIFO semantics. The primary exception to this is that a request that is part of a transaction which has already acquired a lock does not need to wait on that lock during sequencing, and can therefore ignore any queue that has formed on the lock. For other exceptions to this sequencing guarantee, see the [lock table](#lock-table) section below.

Does "FIFO" stand for "first in, first out"? This seems jargony and maybe should just say "'first in, first out' semantics"

v20.1/architecture/transaction-layer.md, line 199 at r3 (raw file):

- A request that is part of a transaction which has already acquired a lock does not need to wait on that lock during sequencing, and can therefore ignore any queue that has formed on the lock.

- Contending requests that encounter different levels of contention may be sequenced in non-FIFO order. This is to allow for greater concurrency. For example, if requests *R1* and *R2* contend on key *K2*, but *R1* is also waiting at key *K1*, *R2* may slip past *R1* and evaluate.

FIFO > "first in, first out" ? Or, if you want to use the acronym, introduce the acronym in the above mention (something like "first in, first out (FIFO)")

v20.1/architecture/transaction-layer.md, line 207 at r3 (raw file):
suggestion:

...; that is, they are placed in some consistent order.

v20.1/architecture/transaction-layer.md, line 213 at r3 (raw file):

Read requests also generate latches.  Multiple read latches over the same keys can be held concurrently, but a read latch and a write latch over the same keys cannot.

Another way to think of a latch is like a mutex, which is only needed for the duration of a single, low-level request. To coordinate longer-running, higher-level requests (i.e., client transactions), we use a durable system of [write intents](#write-intents).

What's a mutex? Should this be linked to an article about mutexes? Or is it an acronym that can be spelled out for easier understanding (or is it a common word that I just don't know)?

rmloveland

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @lnhsingh, and @rmloveland)

v20.1/architecture/transaction-layer.md, line 31 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

Suggestion: I think you can tighten up the language by putting "Unreplicated locks are stored in an ..." and "Replicated Locks (aka write intents) are replicated via Raft..."

(remove the "which")

Fixed.

v20.1/architecture/transaction-layer.md, line 138 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

Suggestion: Break the first sentence into two sentences. Something like:

CockroachDB manages concurrency control using a per-node, in-memory lock table. This table holds a collection of locks acquired by in-progress transactions and incorporates information about write intents as they are discovered during evaluation.

Fixed.

v20.1/architecture/transaction-layer.md, line 152 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

I think it should be "provides isolation between the transactions that issued those requests.." instead of which

Thanks, I always mix them up. Checked out https://www.quickanddirtytips.com/education/grammar/which-versus-that-0

which confirms your comment, since the "that" here is restrictive in that it only refers to the specific transactions which intend to perform conflicting operations (not all transactions).

v20.1/architecture/transaction-layer.md, line 171 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

I think the second comma should be moved:

Each request in a transaction should be isolated from other requests, both during the request's lifetime and after the request has completed (assuming it acquired locks), but within the surrounding transaction's lifetime.

Fixed.

v20.1/architecture/transaction-layer.md, line 181 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

Does "FIFO" stand for "first in, first out"? This seems jargony and maybe should just say "'first in, first out' semantics"

Yes! Went with "first-in, first-out (FIFO)" here so we can use the FIFO abbreviation later on in this page.

v20.1/architecture/transaction-layer.md, line 199 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

FIFO > "first in, first out" ? Or, if you want to use the acronym, introduce the acronym in the above mention (something like "first in, first out (FIFO)")

Ah yes - did just that, thanks!

v20.1/architecture/transaction-layer.md, line 207 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

suggestion:

...; that is, they are placed in some consistent order.

Fixed.

v20.1/architecture/transaction-layer.md, line 213 at r3 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

What's a mutex? Should this be linked to an article about mutexes? Or is it an acronym that can be spelled out for easier understanding (or is it a common word that I just don't know)?

In this context I think it's somewhat common, since it's a type of lock - but I linked to the wikipedia article on it to be sure. We shouldn't expect every reader to know every term, and it's easy to provide the link.

Fixes #6571. Summary of changes (all to 'Transaction Layer' page): - Update 'Writes and reads' section to note that there are now *two* types of locks: write intents, and unreplicated locks managed by the concurrency manager in a per-node lock table (about which more below). - Update 'Write intents' section with information about new concurrency control mechanism using unreplicated lock table. - Add new 'Concurrency control' section describing concurrency manager and lock table at a high level, with links to `SELECT FOR UPDATE` docs. This is followed by more detailed subsections that are mostly adapted from the in-code comments in `pkg/kv/kvserver/concurrency/concurrency_control.go`. Also, move 'Latch manager' section here, since it's being overseen by the concurrency manager.

cockroach-teamcity · 2020-04-13T20:22:19Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/fe055dccc192ee82fec132a889c43368e43a1cdd/

Edited pages:

v20.1/architecture/transaction-layer.md

lnhsingh

Reviewed 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @lnhsingh, and @rmloveland)

rmloveland force-pushed the 20200318-concurrency-manager-and-lock-table branch from 2c31804 to d654d54 Compare March 30, 2020 19:39

rmloveland force-pushed the 20200318-concurrency-manager-and-lock-table branch from d654d54 to d9d0aec Compare March 30, 2020 20:00

rmloveland requested a review from nvanbenschoten March 30, 2020 20:06

nvanbenschoten reviewed Mar 31, 2020

View reviewed changes

irfansharif suggested changes Apr 1, 2020

View reviewed changes

irfansharif reviewed Apr 1, 2020

View reviewed changes

rmloveland mentioned this pull request Apr 2, 2020

Update "transaction layer" with various clarifications about what is operating on what #7057

Open

irfansharif approved these changes Apr 2, 2020

View reviewed changes

rmloveland force-pushed the 20200318-concurrency-manager-and-lock-table branch from d9d0aec to 4defb19 Compare April 2, 2020 16:16

rmloveland commented Apr 2, 2020

View reviewed changes

irfansharif reviewed Apr 2, 2020

View reviewed changes

rmloveland mentioned this pull request Apr 6, 2020

Define "pushing transactions" in the architecture docs #7087

Open

rmloveland commented Apr 6, 2020

View reviewed changes

nvanbenschoten approved these changes Apr 9, 2020

View reviewed changes

rmloveland commented Apr 13, 2020

View reviewed changes

rmloveland force-pushed the 20200318-concurrency-manager-and-lock-table branch from 4defb19 to 62a37da Compare April 13, 2020 15:53

rmloveland requested a review from lnhsingh April 13, 2020 16:01

nvanbenschoten reviewed Apr 13, 2020

View reviewed changes

lnhsingh reviewed Apr 13, 2020

View reviewed changes

rmloveland commented Apr 13, 2020

View reviewed changes

rmloveland force-pushed the 20200318-concurrency-manager-and-lock-table branch from 62a37da to fe055dc Compare April 13, 2020 20:16

lnhsingh self-requested a review April 13, 2020 22:31

lnhsingh approved these changes Apr 13, 2020

View reviewed changes

rmloveland merged commit b4181be into master Apr 14, 2020

rmloveland deleted the 20200318-concurrency-manager-and-lock-table branch April 14, 2020 15:39

rmloveland mentioned this pull request Aug 11, 2021

sql: support lock_timeout session variable cockroachdb/cockroach#68042

Merged

rmloveland mentioned this pull request Mar 15, 2023

SELECT FOR UPDATE uses unreplicated locks #16434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document new approach to locking in KV layer #7020

Document new approach to locking in KV layer #7020

rmloveland commented Mar 30, 2020

cockroach-teamcity commented Mar 30, 2020

cockroach-teamcity commented Mar 30, 2020

cockroach-teamcity commented Mar 30, 2020

nvanbenschoten left a comment

irfansharif left a comment

irfansharif left a comment

irfansharif left a comment

rmloveland left a comment

cockroach-teamcity commented Apr 2, 2020

irfansharif left a comment

rmloveland left a comment

rmloveland commented Apr 9, 2020

nvanbenschoten left a comment

rmloveland left a comment

cockroach-teamcity commented Apr 13, 2020

nvanbenschoten left a comment

lnhsingh left a comment

rmloveland left a comment

cockroach-teamcity commented Apr 13, 2020

lnhsingh left a comment

Document new approach to locking in KV layer #7020

Document new approach to locking in KV layer #7020

Conversation

rmloveland commented Mar 30, 2020

cockroach-teamcity commented Mar 30, 2020

cockroach-teamcity commented Mar 30, 2020

cockroach-teamcity commented Mar 30, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

cockroach-teamcity commented Apr 2, 2020

irfansharif left a comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

rmloveland commented Apr 9, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

cockroach-teamcity commented Apr 13, 2020

nvanbenschoten left a comment

Choose a reason for hiding this comment

lnhsingh left a comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

cockroach-teamcity commented Apr 13, 2020

lnhsingh left a comment

Choose a reason for hiding this comment