Update pagination docs to use keyset / seek method #6114

rmloveland · 2019-12-05T20:15:50Z

Summary of changes:

Explain difference between keyset pagination and LIMIT/OFFSET
Show examples of the former being fast and the latter being slow
Show how to use EXPLAIN to check why the difference exists
Add warning to LIMIT/OFFSET docs recommending keyset pagination
... all of the above for 19.1, 19.2, 20.1 docs

cockroach-teamcity · 2019-12-05T20:15:59Z

This change is

cockroach-teamcity · 2019-12-05T20:18:53Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/21f7abbb0a84d755fc045ddedaf962b4e161dad4/

Edited pages:

cockroach-teamcity · 2019-12-05T20:53:42Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/57ffcc40567dd521d9c55d92a3c1803bbbe1785b/

Edited pages:

rmloveland · 2019-12-05T21:17:43Z

@jordanlewis and @rafiss

Since I think this pagination content is "app-dev" related:

Who's the best engineer to ask for a review? Lots of opinions on the linked issue's thread.

rafiss · 2019-12-06T00:11:31Z

I can do a review of this (at least from an app dev perspective).

rafiss

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)

v19.1/selection-queries.md, line 338 at r1 (raw file):

~~~

### Paginate through limited results

i wonder if our docs should include a caveat somewhere that one should be careful when paginating over a set of records that may change during the process of pagination. depending on the use case, this might be OK, but if additional rows are being added while the app is trying to paginate through everything, both of the methods we talk about here allow the possibility of skipping over some records or seeing duplicate records.

v19.1/selection-queries.md, line 370 at r1 (raw file):

~~~

To get the first page of results using keyset pagination, run:

we could include something about how you would get the first page of results when you don't know what the minimum value of the key is. that is, either select min(key) from table or use a known min value for the data type in question.

v19.1/selection-queries.md, line 472 at r1 (raw file):

O(1)

shouldn't this be O(log(n))? my knowledge of index implementation could be rusty

v19.1/selection-queries.md, line 491 at r1 (raw file):

ordered

i think "ordered" might not be the right word here. perhaps "sequential"

rmloveland

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)

v19.1/selection-queries.md, line 338 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i wonder if our docs should include a caveat somewhere that one should be careful when paginating over a set of records that may change during the process of pagination. depending on the use case, this might be OK, but if additional rows are being added while the app is trying to paginate through everything, both of the methods we talk about here allow the possibility of skipping over some records or seeing duplicate records.

That makes sense.

Is there a "Right Way" to mitigate this issue? Or is it specific to your use case / application? ("it depends")

E.g., @knz mentioned using AS OF SYSTEM TIME in the discussion on #3743, but as another person pointed out, that may not be acceptable in every case.

In other words, is there some way I should update the example SQL? Or should we just add a note saying "be aware of this, and mitigate it"?

v19.1/selection-queries.md, line 370 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

we could include something about how you would get the first page of results when you don't know what the minimum value of the key is. that is, either select min(key) from table or use a known min value for the data type in question.

Fixed by adding a note with that information (across all 3 versions in this PR: 19.1, 19.2, 20.1)

v19.1/selection-queries.md, line 472 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

O(1)
shouldn't this be O(log(n))? my knowledge of index implementation could be rusty

I think I will remove this altogether. I think I was going for something more conceptual and it may have been a mistake to introduce notation that implies this level of precision.

v19.1/selection-queries.md, line 491 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

ordered
i think "ordered" might not be the right word here. perhaps "sequential"

Fixed by changing to "sequential".

rmloveland · 2019-12-12T21:56:32Z

Thanks for the review @rafiss - I addressed your feedback in every case but one, where I had another question.

cockroach-teamcity · 2019-12-12T22:00:09Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/003e882305dbdadaad55f41165b07449dfec2503/

Edited pages:

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)

v19.1/selection-queries.md, line 338 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

That makes sense.

Is there a "Right Way" to mitigate this issue? Or is it specific to your use case / application? ("it depends")

E.g., @knz mentioned using AS OF SYSTEM TIME in the discussion on #3743, but as another person pointed out, that may not be acceptable in every case.

In other words, is there some way I should update the example SQL? Or should we just add a note saying "be aware of this, and mitigate it"?

So I think this conversation should be grounded properly by refering to what users usually expect, that is SQL cursors.
CockroachDB does not support these but they are useful to set expectations right.

A cursor is not much more than a long-standing transaction where the client can request more data from the same query through multiple (separate) statements.

Importantly, cursors operate over a snapshot of the database at the moment the cursor is opened (it's possible to open non-consistent cursors but it's rarely used and also not always supported).

So folk used to cursor already expect that pagination operates at a snapshot of the database established when the query starts.

Bringing this back to CockroachDB, this means that it is a-OK to provide users with a feature that "anchors" the paginated read at some point in time, possibly slightly in the past (as long as it's not further in the past than the most recent statement in the same session).

My opinion remains that proper pagination should be done using AOST with a follower read timestamp: this guarantees there won't be concurrent updates (and thus no risk of txn retries during the pagination), and also enables reads to access other replicas than the leaseholder. If necessary we can provide a feature to make such a paginated statement automatically wait for a small delay before starting the read, to ensure it will capture all the latest writes by the same session (or direct users to use pg_sleep() before their AOST query to achieve the same manually).

rmloveland · 2019-12-16T16:30:20Z

kena <notifications@github.com> writes:

So I think this conversation should be grounded properly by refering to what users usually expect, that is SQL cursors. CockroachDB does not support these but they are useful to set expectations right. A cursor is not much more than a long-standing transaction where the client can request more data from the same query through multiple (separate) statements. Importantly, cursors operate over a snapshot of the database at the moment the cursor is opened (it's possible to open non-consistent cursors but it's rarely used and also not always supported). So folk used to cursor already expect that pagination operates at a snapshot of the database established when the query starts. Bringing this back to CockroachDB, this means that it is a-OK to provide users with a feature that "anchors" the paginated read at some point in time, possibly slightly in the past (as long as it's not further in the past than the most recent statement in the same session). My opinion remains that proper pagination should be done using AOST with a follower read timestamp: this guarantees there won't be concurrent updates (and thus no risk of txn retries during the pagination), and also enables reads to access other replicas than the leaseholder. If necessary we can provide a feature to make such a paginated statement automatically wait for a small delay before starting the read, to ensure it will capture all the latest writes by the same session (or direct users to use pg_sleep() before their AOST query to achieve the same manually).

Thanks for providing this context Raphael. I'll update this PR, then, to show how to paginate with AOST and a follower read timestamp (or ~equivalent, since that's an enterprise feature) to mimic cursor-like functionality.

rafiss · 2019-12-16T19:22:58Z

I agree about recommending AS OF SYSTEM TIME. It also would not hurt to include a note that without AS OF SYSTEM TIME, the application should be aware that there could be missing/duplicated data during pagination.

rafiss

Reviewed 1 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)

cockroach-teamcity · 2019-12-18T19:43:39Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/b7c31eeff17dc35d02c03609f9f3f9ccdabf2534/

Edited pages:

rmloveland · 2019-12-18T19:44:37Z

@rafiss thanks again for your review and comments. I think the stuff you mentioned is in the latest push.

@knz since you shared some specific opinions, would you mind taking a look at the updates I just pushed? here is a direct link to the patch.

I tried to capture the gist of what you said without bringing follower reads into it since those are an enterprise feature - went with '-1m' instead, since if I'm reading it right the current timestamp returned by experimental_follower_read_timestamp() is about 48s.

knz

Looks good to me. you may want to upsell Enterprise to the doc reader by mentioning follower reads as an alternative to a fixed interval.

jseldess · 2020-01-05T13:40:12Z

@rmloveland, looks like you can merge now?

Summary of changes: - Explain difference between keyset pagination and LIMIT/OFFSET - Show examples of the former being fast and the latter being slow - Show how to use EXPLAIN to check why the difference exists - Add warning to LIMIT/OFFSET docs recommending keyset pagination - ... all of the above for 19.1, 19.2, 20.1 docs Fixes #3743.

cockroach-teamcity · 2020-01-09T16:10:40Z

Online preview: http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/5cf73c632169b90d48d6e2bebed2a07627ba94a0/

Edited pages:

rmloveland force-pushed the 20191203-keyset-pagination branch from 21f7abb to 57ffcc4 Compare December 5, 2019 20:50

rafiss requested changes Dec 6, 2019

View reviewed changes

awoods187 mentioned this pull request Dec 6, 2019

Make client-side pagination a true recommendation #3743

Closed

rmloveland force-pushed the 20191203-keyset-pagination branch from 57ffcc4 to 003e882 Compare December 12, 2019 21:54

rmloveland commented Dec 12, 2019

View reviewed changes

knz reviewed Dec 13, 2019

View reviewed changes

rafiss approved these changes Dec 16, 2019

View reviewed changes

rmloveland mentioned this pull request Dec 18, 2019

Allow AS OF SYSTEM TIME (AOST) to be passed to EXPLAIN cockroachdb/cockroach#43294

Closed

rmloveland force-pushed the 20191203-keyset-pagination branch from 003e882 to b7c31ee Compare December 18, 2019 19:40

knz approved these changes Dec 18, 2019

View reviewed changes

rmloveland force-pushed the 20191203-keyset-pagination branch from b7c31ee to 5cf73c6 Compare January 9, 2020 16:05

rmloveland merged commit f83d2af into master Jan 9, 2020

rmloveland deleted the 20191203-keyset-pagination branch January 9, 2020 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pagination docs to use keyset / seek method #6114

Update pagination docs to use keyset / seek method #6114

rmloveland commented Dec 5, 2019

cockroach-teamcity commented Dec 5, 2019

cockroach-teamcity commented Dec 5, 2019

cockroach-teamcity commented Dec 5, 2019

rmloveland commented Dec 5, 2019

rafiss commented Dec 6, 2019

rafiss left a comment

rmloveland left a comment

rmloveland commented Dec 12, 2019

cockroach-teamcity commented Dec 12, 2019

knz left a comment

rmloveland commented Dec 16, 2019 via email

rafiss commented Dec 16, 2019

rafiss left a comment

cockroach-teamcity commented Dec 18, 2019

rmloveland commented Dec 18, 2019

knz left a comment

jseldess commented Jan 5, 2020

cockroach-teamcity commented Jan 9, 2020

Update pagination docs to use keyset / seek method #6114

Update pagination docs to use keyset / seek method #6114

Conversation

rmloveland commented Dec 5, 2019

cockroach-teamcity commented Dec 5, 2019

cockroach-teamcity commented Dec 5, 2019

cockroach-teamcity commented Dec 5, 2019

rmloveland commented Dec 5, 2019

rafiss commented Dec 6, 2019

rafiss left a comment

Choose a reason for hiding this comment

rmloveland left a comment

Choose a reason for hiding this comment

rmloveland commented Dec 12, 2019

cockroach-teamcity commented Dec 12, 2019

knz left a comment

Choose a reason for hiding this comment

rmloveland commented Dec 16, 2019 via email

rafiss commented Dec 16, 2019

rafiss left a comment

Choose a reason for hiding this comment

cockroach-teamcity commented Dec 18, 2019

rmloveland commented Dec 18, 2019

knz left a comment

Choose a reason for hiding this comment

jseldess commented Jan 5, 2020

cockroach-teamcity commented Jan 9, 2020