-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support FOR {UPDATE,SHARE} SKIP LOCKED #83627
Closed
nvanbenschoten
wants to merge
1
commit into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/skipLockedSQL
Closed
sql: support FOR {UPDATE,SHARE} SKIP LOCKED #83627
nvanbenschoten
wants to merge
1
commit into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/skipLockedSQL
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nvanbenschoten
force-pushed
the
nvanbenschoten/skipLockedSQL
branch
from
June 29, 2022 23:53
b36d389
to
4686198
Compare
craig bot
pushed a commit
that referenced
this pull request
Jun 30, 2022
79134: kv: support FOR {UPDATE,SHARE} SKIP LOCKED r=arulajmani a=nvanbenschoten KV portion of #40476. Assists #62734. Assists #72407. Assists #78564. **NOTE: the SQL changes here were extracted from this PR and moved to #83627. This allows us to land the KV portion of this change without exposing it yet.** ```sql CREATE TABLE kv (k INT PRIMARY KEY, v INT) INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3) -- in session 1 BEGIN; UPDATE kv SET v = 0 WHERE k = 1 RETURNING * k | v ----+---- 1 | 0 -- in session 2 BEGIN; SELECT * FROM kv ORDER BY k LIMIT 1 FOR UPDATE SKIP LOCKED k | v ----+---- 2 | 2 -- in session 3 BEGIN; SELECT * FROM kv FOR UPDATE SKIP LOCKED k | v ----+---- 3 | 3 ``` These semantics closely match those of FOR {UPDATE,SHARE} SKIP LOCKED in PostgreSQL. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table. [Here](https://www.pgcasts.com/episodes/the-skip-locked-feature-in-postgres-9-5) is a short video that explains why users might want to use SKIP LOCKED in Postgres. The same motivation applies to CockroachDB. However, SKIP LOCKED is not a complete solution to queues, as MVCC garbage will still become a major problem with sufficiently high consumer throughput. Even with a very low gc.ttl, CockroachDB does not garbage collect MVCC garbage fast enough to avoid slowing down consumers that scan from the head of a queue over MVCC tombstones of previously consumed queue entries. ---- ### Implementation Skip locked has a number of touchpoints in Storage and KV. To understand these, we first need to understand the isolation model of skip-locked. When a request is using a SkipLocked wait policy, it behaves as if run at a weaker isolation level for any keys that it skips over. If the read request does not return a key, it does not make a claim about whether that key does or does not exist or what the key's value was at the read's MVCC timestamp. Instead, it only makes a claim about the set of keys that are returned. For those keys which were not skipped and were returned (and often locked, if combined with a locking strength, though this is not required), serializable isolation is enforced. When the `pebbleMVCCScanner` is configured with the skipLocked option, it does not include locked keys in the result set. To support this, the MVCC layer needs to be provided access to the in-memory lock table, so that it can determine whether keys are locked with unreplicated lock. Replicated locks are represented as intents, which will be skipped over in getAndAdvance. Requests using the SkipLocked wait policy acquire the same latches as before and wait on all latches ahead of them in line. However, if a request is using a SkipLocked wait policy, we always perform optimistic evaluation. In Replica.collectSpansRead, SkipLocked reads are able to constrain their read spans down to point reads on just those keys that were returned and were not already locked. This means that there is a good chance that some or all of the write latches that the SkipLocked read would have blocked on won't overlap with the keys that the request ends up returning, so they won't conflict when checking for optimistic conflicts. Skip locked requests do not scan the lock table when initially sequencing. Instead, they capture a snapshot of the in-memory lock table while sequencing and scan the lock table as they perform their MVCC scan using the btree snapshot stored in the concurrency guard. MVCC was taught about skip locked in the previous commit. Skip locked requests add point reads for each of the keys returned to the timestamp cache, instead of adding a single ranged read. This satisfies the weaker isolation level of skip locked. Because the issuing transaction is not intending to enforce serializable isolation across keys that were skipped by its request, it does not need to prevent writes below its read timestamp to keys that were skipped. Similarly, Skip locked requests only records refresh spans for the individual keys returned, instead of recording a refresh span across the entire read span. Because the issuing transaction is not intending to enforce serializable isolation across keys that were skipped by its request, it does not need to validate that they have not changed if the transaction ever needs to refresh. ---- ### Benchmarking I haven't done any serious benchmarking with this SKIP LOCKED yet, though I'd like to. At some point, I would like to build a simple queue-like workload into the `workload` tool and experiment with various consumer access patterns (non-locking reads, locking reads, skip-locked reads), indexing schemes, concurrency levels (for producers and consumers), and batch sizes. 82915: sql: add locality to system.sql_instances table r=rharding6373 a=rharding6373 This PR adds the column `locality` to the `system.sql_instances` table that contains the locality (e.g., region) of a SQL instance. The encoded locality is a string representing the `roachpb.Locality` that may have been provided when the instance was created. This change also pipes the locality through `InstanceInfo`. This will allow us to determine and use locality information of other SQL instances, e.g. in DistSQL for multi-tenant locality-awareness distribution planning. Informs: #80678 Release note (sql change): Table `system.sql_instances` has a new column, `locality`, that stores the locality of a SQL instance if it was provided when the instance was started. This exposes a SQL instance's locality to other instances in the cluster for query planning. 83418: loopvarcapture: do not flag `defer` within local closure r=srosenberg,dhartunian a=renatolabs Previously, handling of `defer` statements in the `loopvarcapture` linter was naive: whenever a `defer` statement in the body of a loop referenced a loop variable, the linter would flag it as an invalid reference. However, that can be overly restrictive, as a relatively common idiom is to create literal functions and immediately call them so as to take advantage of `defer` semantics, as in the example below: ```go for _, n := range numbers { // ... func() { // ... defer func() { doSomewithing(n) }() // always safe // ... }() } ``` The above reference is valid because it is guaranteed to be called with the correct value for the loop variable. A similar scenario occurs when a closure is assigned to a local variable for use within the loop: ```go for _, n := range numbers { // ... helper := func() { // ... defer func() { doSomething(n) }() // ... } // ... helper() // always safe } ``` In the snippet above, calling the `helper` function is also always safe because the `defer` statement is scoped to the closure containing it. However, it is still *not* safe to call the helper function within a Go routine. This commit updates the `loopvarcapture` linter to recognize when a `defer` statement is safe because it is contained in a local closure. The two cases illustrated above will no longer be flagged, allowing for that idiom to be used freely. Release note: None. 83545: sql/schemachanger: move end to end testing to one test per-file r=fqazi a=fqazi Previously, we allowed multiple tests per-file for end-to-end testing inside the declarative schema changer. This was inadequate because we plan on extending the end-to-end testing to start injecting additional read/write operations at different stages, which would make it difficult. To address this, this patch will split tests into individual files, with one test per file. Additionally, it extends support to allow multiple statements per-test statement, for transaction support testing (this is currently unused). Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com> Co-authored-by: Renato Costa <renato@cockroachlabs.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Now that KV support this, we can pass the wait policy through. Release note (sql change): SELECT ... FOR {UPDATE,SHARE} SKIP LOCKED is now supported. The option can be used to skip rows that cannot be immediately locked instead of blocking on contended row-level lock acquisition.
nvanbenschoten
force-pushed
the
nvanbenschoten/skipLockedSQL
branch
from
June 30, 2022 18:27
4686198
to
1c2f2ff
Compare
craig bot
pushed a commit
that referenced
this pull request
Aug 10, 2022
85720: sql: parser and optimizer support FOR {UPDATE,SHARE} SKIP LOCKED r=rytaft a=rytaft This PR adds support for `SKIP LOCKED` by building on commits from #83627 and #82188. See below for details. **sql: enable the skip-locked wait policy** Now that KV support this, we can pass the wait policy through. Fixes #40476 Release note (sql change): `SELECT ... FOR {UPDATE,SHARE} SKIP LOCKED` is now supported. The option can be used to skip rows that cannot be immediately locked instead of blocking on contended row-level lock acquisition. **opt: optimizer updates for support of SKIP LOCKED** For queries using `SELECT FOR {SHARE,UPDATE} SKIP LOCKED`, we need to disable optimizations that depend on preserved-multiplicity consistency of tables. When `SKIP LOCKED` is used, we will no longer use optimizations that assume: - a PK row exists for every secondary index row - a PK row exists for every referencing FK (if the PK table uses `SKIP LOCKED`) One result of this change is that we will no longer push limits into index joins if the primary index uses locking wait policy `SKIP LOCKED`. This commit also disallows use of multiple column families in tables scanned with `SKIP LOCKED`, since it could result in returning partial rows. Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: this includes a draft of the final commit from #79134, which hooks KV support up for SKIP LOCKED to the corresponding SQL. Pulling that commit out allows us to land the KV portion of the PR.
Fixes #40476.
Assists #62734.
Assists #72407.
Assists #78564.
These semantics closely match those of FOR {UPDATE,SHARE} SKIP LOCKED in PostgreSQL. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.
Here is a short video that explains why users might want to use SKIP LOCKED in Postgres. The same motivation applies to CockroachDB. However, SKIP LOCKED is not a complete solution to queues, as MVCC garbage will still become a major problem with sufficiently high consumer throughput. Even with a very low gc.ttl, CockroachDB does not garbage collect MVCC garbage fast enough to avoid slowing down consumers that scan from the head of a queue over MVCC tombstones of previously consumed queue entries.
Implementation
Skip locked has a number of touchpoints in Storage and KV. To understand these, we first need to understand the isolation model of skip-locked. When a request is using a SkipLocked wait policy, it behaves as if run at a weaker isolation level for any keys that it skips over. If the read request does not return a key, it does not make a claim about whether that key does or does not exist or what the key's value was at the read's MVCC timestamp. Instead, it only makes a claim about the set of keys that are returned. For those keys which were not skipped and were returned (and often locked, if combined with a locking strength, though this is not required), serializable isolation is enforced.
When the
pebbleMVCCScanner
is configured with the skipLocked option, it does not include locked keys in the result set. To support this, the MVCC layer needs to be provided access to the in-memory lock table, so that it can determine whether keys are locked with unreplicated lock. Replicated locks are represented as intents, which will be skipped over in getAndAdvance.Requests using the SkipLocked wait policy acquire the same latches as before and wait on all latches ahead of them in line. However, if a request is using a SkipLocked wait policy, we always perform optimistic evaluation. In Replica.collectSpansRead, SkipLocked reads are able to constrain their read spans down to point reads on just those keys that were returned and were not already locked. This means that there is a good chance that some or all of the write latches that the SkipLocked read would have blocked on won't overlap with the keys that the request ends up returning, so they won't conflict when checking for optimistic conflicts.
Skip locked requests do not scan the lock table when initially sequencing. Instead, they capture a snapshot of the in-memory lock table while sequencing and scan the lock table as they perform their MVCC scan using the btree snapshot stored in the concurrency guard. MVCC was taught about skip locked in the previous commit.
Skip locked requests add point reads for each of the keys returned to the timestamp cache, instead of adding a single ranged read. This satisfies the weaker isolation level of skip locked. Because the issuing transaction is not intending to enforce serializable isolation across keys that were skipped by its request, it does not need to prevent writes below its read timestamp to keys that were skipped.
Similarly, Skip locked requests only records refresh spans for the individual keys returned, instead of recording a refresh span across the entire read span. Because the issuing transaction is not intending to enforce serializable isolation across keys that were skipped by its request, it does not need to validate that they have not changed if the transaction ever needs to refresh.
Benchmarking
I haven't done any serious benchmarking with this SKIP LOCKED yet, though I'd like to. At some point, I would like to build a simple queue-like workload into the
workload
tool and experiment with various consumer access patterns (non-locking reads, locking reads, skip-locked reads), indexing schemes, concurrency levels (for producers and consumers), and batch sizes.Release note (sql change): SELECT ... FOR {UPDATE,SHARE} SKIP LOCKED is now supported. The option can be used to skip rows that cannot be immediately locked instead of blocking on contended row-level lock acquisition.