Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT} #40476

Closed
rafiss opened this issue Sep 4, 2019 · 18 comments · Fixed by #85720
Closed

sql: support FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT} #40476

rafiss opened this issue Sep 4, 2019 · 18 comments · Fixed by #85720
Assignees
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL A-tools-hasura C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team X-anchored-telemetry The issue number is anchored by telemetry references.

Comments

@rafiss
Copy link
Collaborator

rafiss commented Sep 4, 2019

#6583 and #40206 added support for SELECT FOR UPDATE. There is additional syntax to support: SKIP LOCKED and NOWAIT. See an example of the syntax here: https://www.postgresql.org/docs/current/sql-select.html


Update, Feb 2021: CockroachDB v20.2 supports the NO WAIT syntax. SKIP LOCKED is still not supported at the time of this writing.

Epic CRDB-11976

Jira issue: CRDB-5535

@rafiss rafiss added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-pgcompat Semantic compatibility with PostgreSQL X-anchored-telemetry The issue number is anchored by telemetry references. labels Sep 4, 2019
jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 2, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
jordanlewis added a commit to jordanlewis/cockroach that referenced this issue Oct 24, 2019
The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: cockroachdb#5807   (sql: Add support for TEMP tables)
 151: cockroachdb#17511  (sql: support stored procedures)
  86: cockroachdb#26097  (sql: make TIMETZ more pg-compatible)
  56: cockroachdb#10735  (sql: support SQL savepoints)
  55: cockroachdb#32552  (multi-dim arrays)
  55: cockroachdb#26508  (sql: restricted DDL / DML inside transactions)
  52: cockroachdb#32565  (sql: support optional TIME precision)
  39: cockroachdb#243    (roadmap: Blob storage)
  33: cockroachdb#26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: cockroachdb#27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: cockroachdb#12123  (sql: Can't drop and replace a table within a transaction)
  24: cockroachdb#26443  (sql: support user-defined schemas between database and table)
  20: cockroachdb#21286  (sql: Add support for geometric types)
  18: cockroachdb#6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: cockroachdb#22329  (Support XA distributed transactions in CockroachDB)
  16: cockroachdb#24062  (sql: 32 bit SERIAL type)
  16: cockroachdb#30352  (roadmap:when CockroachDB  will support cursor?)
  12: cockroachdb#27791  (sql: support RANGE types)
   8: cockroachdb#40195  (pgwire: multiple active result sets (portals) not supported)
   8: cockroachdb#6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: cockroachdb#23468  (sql: support sql arrays of JSONB)
   5: cockroachdb#40854  (sql: set application_name from connection string)
   4: cockroachdb#35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: cockroachdb#32610  (sql: can't insert self reference)
   4: cockroachdb#40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: cockroachdb#35897  (sql: unknown function: pg_terminate_backend())
   4: cockroachdb#4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: cockroachdb#27796  (sql: support user-defined DOMAIN types)
   3: cockroachdb#3781   (sql: Add Data Type Formatting Functions)
   3: cockroachdb#40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: cockroachdb#35882  (sql: support other character sets)
   2: cockroachdb#10028  (sql: Support view queries with star expansions)
   2: cockroachdb#35807  (sql: INTERVAL output doesn't match PG)
   2: cockroachdb#35902  (sql: large object support)
   2: cockroachdb#40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: cockroachdb#18846  (sql: Support CIDR column type)
   1: cockroachdb#9682   (sql: implement computed indexes)
   1: cockroachdb#31632  (sql: FK options (deferrable, etc))
   1: cockroachdb#24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: cockroachdb#36215  (sql: enable setting standard_conforming_strings to off)
   1: cockroachdb#32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: cockroachdb#36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: cockroachdb#26732  (sql: support the binary operator: <int> / <float>)
   1: cockroachdb#23299  (sql: support coercing string literals to arrays)
   1: cockroachdb#36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: cockroachdb#26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: cockroachdb#21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: cockroachdb#36179  (sql: implicity convert date to timestamp)
   1: cockroachdb#36118  (sql: Cannot parse '24:00' as type time)
   1: cockroachdb#31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None
craig bot pushed a commit that referenced this issue Nov 7, 2019
41252: roachtest: add test that aggregates orm blacklist failures r=jordanlewis a=jordanlewis

The spreadsheet we discussed is unwieldy - hard to edit and impossible to keep
up to date. If we write down blacklists in code, then we can use an approach
like this to always have an up to date aggregation.

So far it seems like there's just a lot of unknowns to categorize still.

The output today:

```
=== RUN   TestBlacklists
 648: unknown                                                (unknown)
 493: #5807   (sql: Add support for TEMP tables)
 151: #17511  (sql: support stored procedures)
  86: #26097  (sql: make TIMETZ more pg-compatible)
  56: #10735  (sql: support SQL savepoints)
  55: #32552  (multi-dim arrays)
  55: #26508  (sql: restricted DDL / DML inside transactions)
  52: #32565  (sql: support optional TIME precision)
  39: #243    (roadmap: Blob storage)
  33: #26725  (sql: support postgres' API to handle blob storage (incl lo_creat, lo_from_bytea))
  31: #27793  (sql: support custom/user-defined base scalar (primitive) types)
  24: #12123  (sql: Can't drop and replace a table within a transaction)
  24: #26443  (sql: support user-defined schemas between database and table)
  20: #21286  (sql: Add support for geometric types)
  18: #6583   (sql: explicit lock syntax (SELECT FOR {SHARE,UPDATE} {skip locked,nowait}))
  17: #22329  (Support XA distributed transactions in CockroachDB)
  16: #24062  (sql: 32 bit SERIAL type)
  16: #30352  (roadmap:when CockroachDB  will support cursor?)
  12: #27791  (sql: support RANGE types)
   8: #40195  (pgwire: multiple active result sets (portals) not supported)
   8: #6130   (sql: add support for key watches with notifications of changes)
   5: Expected Failure                                       (unknown)
   5: #23468  (sql: support sql arrays of JSONB)
   5: #40854  (sql: set application_name from connection string)
   4: #35879  (sql: `default_transaction_read_only` should also accept 'on' and 'off')
   4: #32610  (sql: can't insert self reference)
   4: #40205  (sql: add non-trivial implementations of FOR UPDATE, FOR NO KEY UPDATE, FOR SHARE, FOR NO KEY SHARE)
   4: #35897  (sql: unknown function: pg_terminate_backend())
   4: #4035   (sql/pgwire: missing support for row count limits in pgwire)
   3: #27796  (sql: support user-defined DOMAIN types)
   3: #3781   (sql: Add Data Type Formatting Functions)
   3: #40476  (sql: support `FOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}`)
   3: #35882  (sql: support other character sets)
   2: #10028  (sql: Support view queries with star expansions)
   2: #35807  (sql: INTERVAL output doesn't match PG)
   2: #35902  (sql: large object support)
   2: #40474  (sql: support `SELECT ... FOR UPDATE OF` syntax)
   1: #18846  (sql: Support CIDR column type)
   1: #9682   (sql: implement computed indexes)
   1: #31632  (sql: FK options (deferrable, etc))
   1: #24897  (sql: CREATE OR REPLACE VIEW)
   1: pass?                                                  (unknown)
   1: #36215  (sql: enable setting standard_conforming_strings to off)
   1: #32562  (sql: support SET LOCAL and txn-scoped session variable changes)
   1: #36116  (sql: psychopg: investigate how `'infinity'::timestamp` is presented)
   1: #26732  (sql: support the binary operator: <int> / <float>)
   1: #23299  (sql: support coercing string literals to arrays)
   1: #36115  (sql: psychopg: investigate if datetimetz is being returned instead of datetime)
   1: #26925  (sql: make the CockroachDB integer types more compatible with postgres)
   1: #21085  (sql: WITH RECURSIVE (recursive common table expressions))
   1: #36179  (sql: implicity convert date to timestamp)
   1: #36118  (sql: Cannot parse '24:00' as type time)
   1: #31708  (sql: support current_time)
```

Release justification: non-production change
Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@nvanbenschoten
Copy link
Member

This rest of this was addressed by #43868.

@rafiss
Copy link
Collaborator Author

rafiss commented Jan 15, 2020

@nvanbenschoten thanks for adding this to the parser! It looks like #43868 adds code that references this issue (i.e. unimplementedWithIssueDetailf(40476, ...)) , so I'm going to leave this open. Though if you would prefer, we could also create a new issue and refer to that instead.

@rafiss rafiss reopened this Jan 15, 2020
@nvanbenschoten
Copy link
Member

Good point, thanks for re-opening. I was thinking this was now covered by #40205, but that's not the case.

@KoryNunn
Copy link

Has there been any activity towards adding NOWAIT recently?

@nvanbenschoten
Copy link
Member

Hi @KoryNunn, unfortunately, NOWAIT is not currently on our short term roadmap. Do you mind elaborating on your use case for the functionality?

@KoryNunn
Copy link

Sure, we would like to use it to check if processing should be done on a record. If any other process has a lock on a row, it should ignore it, as another process is working on it.

Currently without NOWAIT all processes that look up a row have to wait untill the lock is released, see that processing is complete, and then skip the row.

If you have any suggestions to get equivalent functionality without NOWAIT, I'd be very happy to learn.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 5, 2020
Informs cockroachdb#40476.
Informs cockroachdb#51624.

This commit introduces the concept of request "wait policies". A
WaitPolicy specifies the behavior of a request when it encounters
conflicting locks held by other active transactions. The default
behavior is to block until the conflicting lock is released, but other
policies can make sense in special situations.

Within this new formalization, the commit creates two initial wait
policy variants:
```
// Block indicates that if a request encounters a conflicting locks held by
// another active transaction, it should wait for the conflicting lock to be
// released before proceeding.
Block = 0;

// Error indicates that if a request encounters a conflicting locks held by
// another active transaction, it should raise an error instead of blocking.
Error = 1;
```

`Block` is equivalent to the current behavior, so there's no further
action needed for that variant.

However, the `Error` policy is new, and this commit teaches the
`lockTableWaiter` about it. It ensures that when a request with the
`Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH
PushRequest to determine whether the lock is abandoned or whether its
holder is still active. If the holder is abandoned, the request cleans
up the lock and proceeds. If the holder is still active, a
WriteIntentError is returned to the client.

This will unblock both of the referenced issues.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 5, 2020
Informs cockroachdb#40476.
Informs cockroachdb#51624.

This commit introduces the concept of request "wait policies". A
WaitPolicy specifies the behavior of a request when it encounters
conflicting locks held by other active transactions. The default
behavior is to block until the conflicting lock is released, but other
policies can make sense in special situations.

Within this new formalization, the commit creates two initial wait
policy variants:
```
// Block indicates that if a request encounters a conflicting locks held by
// another active transaction, it should wait for the conflicting lock to be
// released before proceeding.
Block = 0;

// Error indicates that if a request encounters a conflicting locks held by
// another active transaction, it should raise an error instead of blocking.
Error = 1;
```

`Block` is equivalent to the current behavior, so there's no further
action needed for that variant.

However, the `Error` policy is new, and this commit teaches the
`lockTableWaiter` about it. It ensures that when a request with the
`Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH
PushRequest to determine whether the lock is abandoned or whether its
holder is still active. If the holder is abandoned, the request cleans
up the lock and proceeds. If the holder is still active, a
WriteIntentError is returned to the client.

This will unblock both of the referenced issues.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 6, 2020
Informs cockroachdb#40476.
Informs cockroachdb#51624.

This commit introduces the concept of request "wait policies". A
WaitPolicy specifies the behavior of a request when it encounters
conflicting locks held by other active transactions. The default
behavior is to block until the conflicting lock is released, but other
policies can make sense in special situations.

Within this new formalization, the commit creates two initial wait
policy variants:
```
// Block indicates that if a request encounters a conflicting locks held by
// another active transaction, it should wait for the conflicting lock to be
// released before proceeding.
Block = 0;

// Error indicates that if a request encounters a conflicting locks held by
// another active transaction, it should raise an error instead of blocking.
Error = 1;
```

`Block` is equivalent to the current behavior, so there's no further
action needed for that variant.

However, the `Error` policy is new, and this commit teaches the
`lockTableWaiter` about it. It ensures that when a request with the
`Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH
PushRequest to determine whether the lock is abandoned or whether its
holder is still active. If the holder is abandoned, the request cleans
up the lock and proceeds. If the holder is still active, a
WriteIntentError is returned to the client.

This will unblock both of the referenced issues.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Aug 7, 2020
Informs cockroachdb#40476.
Informs cockroachdb#51624.

This commit introduces the concept of request "wait policies". A
WaitPolicy specifies the behavior of a request when it encounters
conflicting locks held by other active transactions. The default
behavior is to block until the conflicting lock is released, but other
policies can make sense in special situations.

Within this new formalization, the commit creates two initial wait
policy variants:
```
// Block indicates that if a request encounters a conflicting locks held by
// another active transaction, it should wait for the conflicting lock to be
// released before proceeding.
Block = 0;

// Error indicates that if a request encounters a conflicting locks held by
// another active transaction, it should raise an error instead of blocking.
Error = 1;
```

`Block` is equivalent to the current behavior, so there's no further
action needed for that variant.

However, the `Error` policy is new, and this commit teaches the
`lockTableWaiter` about it. It ensures that when a request with the
`Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH
PushRequest to determine whether the lock is abandoned or whether its
holder is still active. If the holder is abandoned, the request cleans
up the lock and proceeds. If the holder is still active, a
WriteIntentError is returned to the client.

This will unblock both of the referenced issues.
craig bot pushed a commit that referenced this issue Aug 7, 2020
52388: kv: add WaitPolicy option to BatchRequest r=nvanbenschoten a=nvanbenschoten

Informs #40476.
Informs #51624.

This commit introduces the concept of request "wait policies". A
WaitPolicy specifies the behavior of a request when it encounters
conflicting locks held by other active transactions. The default
behavior is to block until the conflicting lock is released, but other
policies can make sense in special situations.

Within this new formalization, the commit creates two initial wait
policy variants:
```
// Block indicates that if a request encounters a conflicting locks held by
// another active transaction, it should wait for the conflicting lock to be
// released before proceeding.
Block = 0;

// Error indicates that if a request encounters a conflicting locks held by
// another active transaction, it should raise an error instead of blocking.
Error = 1;
```

`Block` is equivalent to the current behavior, so there's no further
action needed for that variant.

However, the `Error` policy is new, and this commit teaches the
`lockTableWaiter` about it. It ensures that when a request with the
`Error` wait policy encounters a conflicting lock, it uses a PUSH_TOUCH
PushRequest to determine whether the lock is abandoned or whether its
holder is still active. If the holder is abandoned, the request cleans
up the lock and proceeds. If the holder is still active, a
WriteIntentError is returned to the client.

This will unblock both of the referenced issues.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@kocoten1992
Copy link

kocoten1992 commented Feb 17, 2021

I want to share my experience, I'm also building a queues, before I though I would need SELECT FOR UPDATE and SKIP LOCKED, turn out I didn't need any of those!!

My trick is to use CTE with RETURNING clause, for example:

with abc as (
    select id table_1 where ... limit 1
), xyz as (
    update table_2 set ... returning id
), mnt as (
    delete from table_3 where table_1_id in (select id from table_1) and not exists (select * from xyz)
) select 1;

You get the idea!

Since with CTE we could run many sql, you can get rid of transaction (autocommit mode) and still guarantee serialize isolation. That how I don't need these feature anymore.

P/s: here another magic trick I've learn with CTE, multiple select on many table at the same time, for example: you want to select 2 columns from table table_1 and one column from table_2, how can it be done with a single sql only ? BE HOLD

with
select_table_1 as (
    select row_to_json(row(a, b)) as result from table_1
),
select_table_2 as (
    select row_to_json(row(c)) as result from table_2
)
select result from select_table_1
union all
select result from select_table_2
--admittedly the CTE part in this case is redundant, I got carried away and don't know where else I should post them

@Nican
Copy link

Nican commented Sep 5, 2021

I am in need of SKIP LOCKED for a ticketing system. I have a scenario where each user can only claim one ticket, and would like a method for users to claim the first non-claimed ticket in a payment transaction.

Also making queues would be nice.

@Nican
Copy link

Nican commented Sep 7, 2021

Wouldn't this work ? update tickets set user_id = ?, claimed = true where claimed = false returning *

That is not the only operation inside of the transaction. The order must be completed, vouchers may have been claimed, and other things. I would like to keep the atomicity of the whole operation.

@Bessonov
Copy link

Bessonov commented Sep 7, 2021

Transactions are serialized (not linearilized) by design. Do you have an example when it won't work for you? And look at CTE above.

@Nican
Copy link

Nican commented Sep 7, 2021

Thank you for taking your time @kocoten1992 / @Bessonov.

I just took some time to test things.

The original issue that we had is that were using a in_stock value in a products table. Every time an order was complete, a UPDATE products SET in_stock=in_stock-1 query would be executed, but we found that the row would be locked, and only allow 1 transaction to be executed at the time. This was causing delays sometimes when several orders were being processed at the same time for the same product. (Unfortunately, due to some design decisions, some orders may take up to 3 seconds to process)

I just did some tests by using the tickets table without using "SKIP LOCKED", and while the operations are still serialized, it has the lock/delay issue. Each transaction is going to be waiting on other transactions to find an available ticket.

Example code:

create table orders(id int primary key, status string);
create table tickets (id int primary key, claimed boolean);
insert into orders values (0, 'unpaid'), (1, 'unpaid');
insert into tickets values (0, false), (1, false);

User 1:

begin transaction;
update orders set status='paid' where id=0;
update tickets set claimed=true where claimed=false limit 1 returning *;

User 2:

begin transaction;
update orders set status='paid' where id=1;
update tickets set claimed=true where claimed=false limit 1 returning *;

User 2 is stuck until User 1 runs commit. While the code functions correctly, it is a bottleneck during periods of high load.

@kocoten1992
Copy link

@Bessonov
Copy link

Bessonov commented Sep 8, 2021

Doesn't something like work for you? (I have no crdb on this laptop):

with t as (
  select * from tickets where claimed=false limit 1
),
o as (
  update orders set status='paid' where id=0
)
update tickets set claimed=true where id in (select id from t) returning *

@otan
Copy link
Contributor

otan commented Dec 8, 2021

this blocks hasura

i've put this on KV for now, because i think a lot of the work is in that layer

@blathers-crl blathers-crl bot added the T-kv KV Team label Dec 8, 2021
@otan otan mentioned this issue Dec 8, 2021
10 tasks
craig bot pushed a commit that referenced this issue Jun 30, 2022
79134: kv: support FOR {UPDATE,SHARE} SKIP LOCKED r=arulajmani a=nvanbenschoten

KV portion of #40476.
Assists #62734.
Assists #72407.
Assists #78564.

**NOTE: the SQL changes here were extracted from this PR and moved to #83627. This allows us to land the KV portion of this change without exposing it yet.**

```sql
CREATE TABLE kv (k INT PRIMARY KEY, v INT)
INSERT INTO kv VALUES (1, 1), (2, 2), (3, 3)


-- in session 1
BEGIN; UPDATE kv SET v = 0 WHERE k = 1 RETURNING *

  k | v
----+----
  1 | 0


-- in session 2
BEGIN; SELECT * FROM kv ORDER BY k LIMIT 1 FOR UPDATE SKIP LOCKED

  k | v
----+----
  2 | 2


-- in session 3
BEGIN; SELECT * FROM kv FOR UPDATE SKIP LOCKED

  k | v
----+----
  3 | 3
```

These semantics closely match those of FOR {UPDATE,SHARE} SKIP LOCKED in PostgreSQL. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table.

[Here](https://www.pgcasts.com/episodes/the-skip-locked-feature-in-postgres-9-5) is a short video that explains why users might want to use SKIP LOCKED in Postgres. The same motivation applies to CockroachDB. However, SKIP LOCKED is not a complete solution to queues, as MVCC garbage will still become a major problem with sufficiently high consumer throughput. Even with a very low gc.ttl, CockroachDB does not garbage collect MVCC garbage fast enough to avoid slowing down consumers that scan from the head of a queue over MVCC tombstones of previously consumed queue entries.

----

### Implementation

Skip locked has a number of touchpoints in Storage and KV. To understand these, we first need to understand the isolation model of skip-locked. When a request is using a SkipLocked wait policy, it behaves as if run at a weaker isolation level for any keys that it skips over. If the read request does not return a key, it does not make a claim about whether that key does or does not exist or what the key's value was at the read's MVCC timestamp. Instead, it only makes a claim about the set of keys that are returned. For those keys which were not skipped and were returned (and often locked, if combined with a locking strength, though this is not required), serializable isolation is enforced.

When the `pebbleMVCCScanner` is configured with the skipLocked option, it does not include locked keys in the result set. To support this, the MVCC layer needs to be provided access to the in-memory lock table, so that it can determine whether keys are locked with unreplicated lock. Replicated locks are represented as intents, which will be skipped over in getAndAdvance.

Requests using the SkipLocked wait policy acquire the same latches as before and wait on all latches ahead of them in line. However, if a request is using a SkipLocked wait policy, we always perform optimistic evaluation. In Replica.collectSpansRead, SkipLocked reads are able to constrain their read spans down to point reads on just those keys that were returned and were not already locked. This means that there is a good chance that some or all of the write latches that the SkipLocked read would have blocked on won't overlap with the keys that the request ends up returning, so they won't conflict when checking for optimistic conflicts.

Skip locked requests do not scan the lock table when initially sequencing. Instead, they capture a snapshot of the in-memory lock table while sequencing and scan the lock table as they perform their MVCC scan using the btree snapshot stored in the concurrency guard. MVCC was taught about skip locked in the previous commit.

Skip locked requests add point reads for each of the keys returned to the timestamp cache, instead of adding a single ranged read. This satisfies the weaker isolation level of skip locked. Because the issuing transaction is not intending to enforce serializable isolation across keys that were skipped by its request, it does not need to prevent writes below its read timestamp to keys that were skipped.

Similarly, Skip locked requests only records refresh spans for the individual keys returned, instead of recording a refresh span across the entire read span. Because the issuing transaction is not intending to enforce serializable isolation across keys that were skipped by its request, it does not need to validate that they have not changed if the transaction ever needs to refresh.

----

### Benchmarking

I haven't done any serious benchmarking with this SKIP LOCKED yet, though I'd like to. At some point, I would like to build a simple queue-like workload into the `workload` tool and experiment with various consumer access patterns (non-locking reads, locking reads, skip-locked reads), indexing schemes, concurrency levels (for producers and consumers), and batch sizes.

82915: sql: add locality to system.sql_instances table r=rharding6373 a=rharding6373

This PR adds the column `locality` to the `system.sql_instances` table
that contains the locality (e.g., region) of a SQL instance. The encoded
locality is a string representing the `roachpb.Locality` that may have
been provided when the instance was created.

This change also pipes the locality through `InstanceInfo`. This will
allow us to determine and use locality information of other SQL
instances, e.g. in DistSQL for multi-tenant locality-awareness
distribution planning.

Informs: #80678

Release note (sql change): Table `system.sql_instances` has a new
column, `locality`, that stores the locality of a SQL instance if it was
provided when the instance was started. This exposes a SQL instance's
locality to other instances in the cluster for query planning.


83418: loopvarcapture: do not flag `defer` within local closure r=srosenberg,dhartunian a=renatolabs

Previously, handling of `defer` statements in the `loopvarcapture`
linter was naive: whenever a `defer` statement in the body of a loop
referenced a loop variable, the linter would flag it as an invalid
reference. However, that can be overly restrictive, as a relatively
common idiom is to create literal functions and immediately call them
so as to take advantage of `defer` semantics, as in the example below:

```go
for _, n := range numbers {
    // ...
    func() {
           // ...
           defer func() { doSomewithing(n) }() // always safe
           // ...
    }()
}
```

The above reference is valid because it is guaranteed to be called
with the correct value for the loop variable.

A similar scenario occurs when a closure is assigned to a local
variable for use within the loop:

```go
for _, n := range numbers {
    // ...
    helper := func() {
           // ...
           defer func() { doSomething(n) }()
           // ...
    }
    // ...
    helper() // always safe
}
```

In the snippet above, calling the `helper` function is also always
safe because the `defer` statement is scoped to the closure containing
it. However, it is still *not* safe to call the helper function within
a Go routine.

This commit updates the `loopvarcapture` linter to recognize when a
`defer` statement is safe because it is contained in a local
closure. The two cases illustrated above will no longer be flagged,
allowing for that idiom to be used freely.

Release note: None.

83545: sql/schemachanger: move end to end testing to one test per-file r=fqazi a=fqazi

Previously, we allowed multiple tests per-file for end-to-end
testing inside the declarative schema changer. This was inadequate
because we plan on extending the end-to-end testing to start injecting
additional read/write operations at different stages, which would
make it difficult. To address this, this patch will split tests into
individual files, with one test per file. Additionally, it extends
support to allow multiple statements per-test statement, for transaction
support testing (this is currently unused).

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Aug 10, 2022
85720: sql: parser and optimizer support FOR {UPDATE,SHARE} SKIP LOCKED r=rytaft a=rytaft

This PR adds support for `SKIP LOCKED` by building on commits from #83627 and #82188. See below for details.

**sql: enable the skip-locked wait policy**

Now that KV support this, we can pass the wait policy through.

Fixes #40476

Release note (sql change): `SELECT ... FOR {UPDATE,SHARE} SKIP LOCKED`
is now supported. The option can be used to skip rows that cannot be
immediately locked instead of blocking on contended row-level lock
acquisition.

**opt: optimizer updates for support of SKIP LOCKED**

For queries using `SELECT FOR {SHARE,UPDATE} SKIP LOCKED`, we need to disable
optimizations that depend on preserved-multiplicity consistency of
tables. When `SKIP LOCKED` is used, we will no longer use optimizations that
assume:
- a PK row exists for every secondary index row
- a PK row exists for every referencing FK (if the PK table uses `SKIP LOCKED`)

One result of this change is that we will no longer push limits into index
joins if the primary index uses locking wait policy `SKIP LOCKED`.

This commit also disallows use of multiple column families in tables scanned
with `SKIP LOCKED`, since it could result in returning partial rows.

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig craig bot closed this as completed in 338d7fe Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-pgcompat Semantic compatibility with PostgreSQL A-tools-hasura C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
None yet
9 participants