Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the 'SELECT ... FOR UPDATE' statement #6671

Merged
merged 1 commit into from
Mar 5, 2020

Conversation

rmloveland
Copy link
Contributor

Fixes #6000.

Summary of changes:

  • Add docs for the SELECT ... FOR UPDATE statement, including an
    example showing its intended use.

  • Update other pages re: selection queries to mention the use of SELECT FOR UPDATE to reduce contention/retries and improve
    throughput/latency.

  • Update Transactions page and Perf Best Practices page to mention
    SELECT FOR UPDATE as a possible solution to contention/retry issues.

  • Add SELECT FOR UPDATE to SQL Statements and SQL Feature Support
    pages.

  • Add docs for new enable_implicit_select_for_update session and
    cluster setting.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland
Copy link
Contributor Author

Related issue to document KV layer changes: #6571

@rmloveland
Copy link
Contributor Author

Nathan this PR is meant to document the user-facing changes related to SELECT FOR UPDATE. We will document the KV-layer changes separately via #6571.

Please take a look at your convenience and let me know if this is missing any pertinent info that SQL users should have re: SFU, or if anything is incorrect. Also, if you have suggestions re: how to better explain this feature, I'm happy to update.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for publishing this Rich!

Do we no longer get previews for these changes?

Reviewed 12 of 12 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)


_includes/v20.1/misc/session-vars.html, line 102 at r1 (raw file):

    <code>enable_implicit_select_for_update</code>
   </td>
   <td><span class="version-tag">New in v20.1</span>: Indicates whether <a href="update.html"><code>UPDATE</code></a> statements acquire locks using the <code>FOR UPDATE</code> locking mode during their initial row scan, which improves performance for contended workloads.  For more information about how <code>FOR UPDATE</code> locking works, see the documentation for <a href="select-for-update.html"><code>SELECT FOR UPDATE</code></a>.</td>

This may also include DELETE and UPSERT statements by the 20.1 release. I'm not sure whether you want to mention that now or not.


_includes/v20.1/sql/select-for-update-overview.md, line 5 at r1 (raw file):

Benefits of using `SELECT FOR UPDATE` include:

- Reduced [transaction contention][transaction_contention] when your workload requires concurrent access to the same rows.

Transaction contention is not reduced by SELECT FOR UPDATE. In fact, it's increased.


_includes/v20.1/sql/select-for-update-overview.md, line 7 at r1 (raw file):

- Reduced [transaction contention][transaction_contention] when your workload requires concurrent access to the same rows.
- A substantial reduction the number of [transaction retries][retries] in scenarios where a transaction performs a read and then updates the same row it just read.
- Increased throughput.

We should qualify this further. Neither of these statements is true for uncontended workloads. I think we should remove this list and expand upon the second point more, as it's the one that gets at the heart of why users would want to use SELECT FOR UPDATE.

SELECT FOR UPDATE allows users to acquire pessimistic locks during reads so that they can't be modified between the time that they are read and the time that they are updated. This prevents retries, which occur when operations interleave like this. This also prevents thrashing during these read-modify-write attempts because transactions are forced to queue up during the read operation instead of during the write operation.

All of this is what leads to increased throughput and decreased tail latency for contended operations.

Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we no longer get previews

AIUI the security team changed perms on our S3 buckets recently, which affected docs builds. According to some discussion in the #education channel, this should be fixed now.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


_includes/v20.1/misc/session-vars.html, line 102 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This may also include DELETE and UPSERT statements by the 20.1 release. I'm not sure whether you want to mention that now or not.

For now I have filed #6699 to track adding DELETE and UPSERT here later as needed.


_includes/v20.1/sql/select-for-update-overview.md, line 5 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Transaction contention is not reduced by SELECT FOR UPDATE. In fact, it's increased.

Interesting. Is that because the read-based queueing you describe below causes more transactions to "stack up" behind the one that holds the lock?


_includes/v20.1/sql/select-for-update-overview.md, line 7 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should qualify this further. Neither of these statements is true for uncontended workloads. I think we should remove this list and expand upon the second point more, as it's the one that gets at the heart of why users would want to use SELECT FOR UPDATE.

SELECT FOR UPDATE allows users to acquire pessimistic locks during reads so that they can't be modified between the time that they are read and the time that they are updated. This prevents retries, which occur when operations interleave like this. This also prevents thrashing during these read-modify-write attempts because transactions are forced to queue up during the read operation instead of during the write operation.

All of this is what leads to increased throughput and decreased tail latency for contended operations.

Ah, OK. Thanks for this information.

I have updated this section to remove the bulleted list and rewrite things a bit. I tried to simplify the language slightly while still also hitting the points you mentioned. PTAL.

@rmloveland
Copy link
Contributor Author

Also, thanks for the review Nathan! (Since I forgot to say that :-} )

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)


_includes/v20.1/sql/select-for-update-overview.md, line 5 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Interesting. Is that because the read-based queueing you describe below causes more transactions to "stack up" behind the one that holds the lock?

Exactly, this is called "pessimistic" locking, as opposed to "optimistic" locking. With SELECT FOR UPDATE, the reads grab locks and queue behind one another instead of executing concurrently. By itself, this would actually hurt performance. However, when performing an UPDATE of a row that was just read, we need to make sure that the row hasn't changed since the read. If we don't grab locks during the read and we're performing contended UPDATEs then there's a good chance that the row has changed and we'll need to restart the entire transaction (ie. we were optimistically hoping the row wouldn't change but doing nothing to prevent that, and it cost us). So SELECT FOR UPDATE is more aggressive (read: pessimistic) with lock, which has a small negative performance impact on its own but can prevent these full txn restarts, which hurt performance significantly more.


_includes/v20.1/sql/select-for-update-overview.md, line 7 at r1 (raw file):

read-modify-write workflow

Consider mentioning what this means in the context of SQL (e.g. a SELECT and then an UPDATE on that select).

Also, this should probably mention that the thrashing only happens with contended read-modify-write operations.

In addition,

The thrashing you refer to in the previous sentence is exactly what causes these retries, so this should be reworded a little bit.


v20.1/select-for-update.md, line 13 at r2 (raw file):

<div>
  {% include {{ page.version.version }}/sql/diagrams/simple_select_clause.html %}

Is this the right diagram to include here? It doesn't actually include any mention of "FOR UPDATE".

Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


_includes/v20.1/sql/select-for-update-overview.md, line 5 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Exactly, this is called "pessimistic" locking, as opposed to "optimistic" locking. With SELECT FOR UPDATE, the reads grab locks and queue behind one another instead of executing concurrently. By itself, this would actually hurt performance. However, when performing an UPDATE of a row that was just read, we need to make sure that the row hasn't changed since the read. If we don't grab locks during the read and we're performing contended UPDATEs then there's a good chance that the row has changed and we'll need to restart the entire transaction (ie. we were optimistically hoping the row wouldn't change but doing nothing to prevent that, and it cost us). So SELECT FOR UPDATE is more aggressive (read: pessimistic) with lock, which has a small negative performance impact on its own but can prevent these full txn restarts, which hurt performance significantly more.

Thanks for explaining this.


_includes/v20.1/sql/select-for-update-overview.md, line 7 at r1 (raw file):
Re: translating "read-modify-write" to SQL, and specifying that the thrashing only happens with contended read-modify-write operations, I updated that part to say the following:

Because this queueing happens during the read operation, the thrashing that would otherwise occur (when multiple concurrently executing transactions attempt to SELECT the same data and then UPDATE the results of that selection) is prevented.

Re: the thrashing causing the retries, I updated that sentence to say the following:

By preventing this thrashing, CockroachDB also prevents the [transaction retries][retries] that would otherwise occur.

PTAL.


v20.1/select-for-update.md, line 13 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this the right diagram to include here? It doesn't actually include any mention of "FOR UPDATE".

It's not the right diagram. For now, I have deleted this section and diagram from this page (for now) and filed #6720 to get help figuring out how to generate the right diagram.

I notice that the CRDB grammar contains a for_locking_strength section which has what we want (FOR UPDATE), but I'm not sure how to get from there to a railroad diagram. The pkg/cmd/docgen tool is something of a black box (to me); I'll try to figure this out / get help via that other issue.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


_includes/v20.1/sql/select-for-update-overview.md, line 7 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Re: translating "read-modify-write" to SQL, and specifying that the thrashing only happens with contended read-modify-write operations, I updated that part to say the following:

Because this queueing happens during the read operation, the thrashing that would otherwise occur (when multiple concurrently executing transactions attempt to SELECT the same data and then UPDATE the results of that selection) is prevented.

Re: the thrashing causing the retries, I updated that sentence to say the following:

By preventing this thrashing, CockroachDB also prevents the [transaction retries][retries] that would otherwise occur.

PTAL.

LGTM now, though you might s/when/if/ and pull that out of the parens. If the UPDATEs aren't contended then it's perfectly fine to not use SELECT FOR UPDATE.

Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


_includes/v20.1/sql/select-for-update-overview.md, line 7 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

LGTM now, though you might s/when/if/ and pull that out of the parens. If the UPDATEs aren't contended then it's perfectly fine to not use SELECT FOR UPDATE.

Done and done (s/when/if/ and paren removal). Thanks!

@rmloveland rmloveland requested a review from lnhsingh March 3, 2020 19:55
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lnhsingh)

Copy link
Contributor

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One language suggestion + clean up, but otherwise LGTM

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rmloveland)


_includes/v20.1/sql/select-for-update-overview.md, line 3 at r4 (raw file):

<span class="version-tag">New in v20.1</span>: The `SELECT ... FOR UPDATE` statement is used to order transactions by controlling concurrent access to one or more rows of a table.

It works by causing the rows returned by a [selection query][selection] to be locked, such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue based on when they tried to read the value of the locked rows.

It works by causing the rows returned by a selection query to be locked > It works by locking the rows returned by a selection query,


v20.1/performance-best-practices-overview.md, line 386 at r4 (raw file):

It works by causing the rows returned by a [selection query][selection] to be locked, such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue based on when they tried to read the value of the locked rows.

I think this description should be the same as your select-for-update include above. Something like:

It works by locking the rows returned by a [selection query][selection], such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue based on when they tried to read the value of the locked rows.


v20.1/select-for-update.md, line 14 at r4 (raw file):

The user must have the `SELECT` and `UPDATE` [privileges](authorization.html#assign-privileges) on the tables used as operands.

## Parameters

Should there also be a Synopsis section?


v20.1/selection-queries.md, line 25 at r4 (raw file):

<div>
  {% include {{ page.version.version }}/sql/diagrams/select.html %}

This is rendering with what looks like merge conflict messages - <<<<<<< HEAD and ====== and >>>>>>> Recursive CTE docs. I think there's also two diagrams showing when there should be one.

Fixes #6000.

Summary of changes:

- Add docs for the `SELECT ... FOR UPDATE` statement, including an
  example showing its intended use.

- Update other pages re: selection queries to mention the use of `SELECT
  FOR UPDATE` to reduce retries and improve throughput/latency.

- Update Transactions page and Perf Best Practices page to mention
  `SELECT FOR UPDATE` as a possible solution to thrashing/retry issues.

- Add `SELECT FOR UPDATE` to SQL Statements and SQL Feature Support
  pages.

- Add docs for new `enable_implicit_select_for_update` session and
  cluster setting.
Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lauren!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lnhsingh)


_includes/v20.1/sql/select-for-update-overview.md, line 3 at r4 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

It works by causing the rows returned by a selection query to be locked > It works by locking the rows returned by a selection query,

Updated! Much better, thanks.


v20.1/performance-best-practices-overview.md, line 386 at r4 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…
It works by causing the rows returned by a [selection query][selection] to be locked, such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue based on when they tried to read the value of the locked rows.

I think this description should be the same as your select-for-update include above. Something like:

It works by locking the rows returned by a [selection query][selection], such that other transactions trying to access those rows are forced to wait for the transaction that locked the rows to finish. These other transactions are effectively put into a queue based on when they tried to read the value of the locked rows.

Fixed.


v20.1/select-for-update.md, line 14 at r4 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

Should there also be a Synopsis section?

Yes! Unfortunately, I have not been able to figure out how to generate a diagram correctly for this variant of SELECT. Filed #6720 - details about the pain are listed there.


v20.1/selection-queries.md, line 25 at r4 (raw file):

Previously, lnhsingh (Lauren Hirata Singh) wrote…

This is rendering with what looks like merge conflict messages - <<<<<<< HEAD and ====== and >>>>>>> Recursive CTE docs. I think there's also two diagrams showing when there should be one.

Weird! I am not seeing the merge conflict markers in my local repo. I just rebased everything onto latest master with no conflicts. Will check to see if the conflict markers are there after I push this update.

@rmloveland
Copy link
Contributor Author

This is rendering with what looks like merge conflict messages - <<<<<<< HEAD and ====== and >>>>>>> Recursive CTE docs. I think there's also two diagrams showing when there should be one.

This IS there, my bad. I went and looked at the include. It was introduced in a different PR :-/

I will make the fix in a separate PR since I want to drive some discussion of one of my pet issues #2836

@rmloveland rmloveland merged commit b3a7005 into master Mar 5, 2020
@rmloveland rmloveland deleted the 20200213-select-for-update branch March 5, 2020 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable “SELECT FOR UPDATE” Statement
4 participants