Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pagination docs to use keyset / seek method #6114

Merged
merged 1 commit into from
Jan 9, 2020

Conversation

rmloveland
Copy link
Contributor

Summary of changes:

  • Explain difference between keyset pagination and LIMIT/OFFSET
  • Show examples of the former being fast and the latter being slow
  • Show how to use EXPLAIN to check why the difference exists
  • Add warning to LIMIT/OFFSET docs recommending keyset pagination
  • ... all of the above for 19.1, 19.2, 20.1 docs

Fixes #3743

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland rmloveland force-pushed the 20191203-keyset-pagination branch from 21f7abb to 57ffcc4 Compare December 5, 2019 20:50
@rmloveland
Copy link
Contributor Author

@jordanlewis and @rafiss

Since I think this pagination content is "app-dev" related:

Who's the best engineer to ask for a review? Lots of opinions on the linked issue's thread.

@rafiss
Copy link
Contributor

rafiss commented Dec 6, 2019

I can do a review of this (at least from an app dev perspective).

Copy link
Contributor

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)


v19.1/selection-queries.md, line 338 at r1 (raw file):

~~~

### Paginate through limited results

i wonder if our docs should include a caveat somewhere that one should be careful when paginating over a set of records that may change during the process of pagination. depending on the use case, this might be OK, but if additional rows are being added while the app is trying to paginate through everything, both of the methods we talk about here allow the possibility of skipping over some records or seeing duplicate records.


v19.1/selection-queries.md, line 370 at r1 (raw file):

~~~

To get the first page of results using keyset pagination, run:

we could include something about how you would get the first page of results when you don't know what the minimum value of the key is. that is, either select min(key) from table or use a known min value for the data type in question.


v19.1/selection-queries.md, line 472 at r1 (raw file):

O(1)

shouldn't this be O(log(n))? my knowledge of index implementation could be rusty


v19.1/selection-queries.md, line 491 at r1 (raw file):

ordered

i think "ordered" might not be the right word here. perhaps "sequential"

Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)


v19.1/selection-queries.md, line 338 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i wonder if our docs should include a caveat somewhere that one should be careful when paginating over a set of records that may change during the process of pagination. depending on the use case, this might be OK, but if additional rows are being added while the app is trying to paginate through everything, both of the methods we talk about here allow the possibility of skipping over some records or seeing duplicate records.

That makes sense.

Is there a "Right Way" to mitigate this issue? Or is it specific to your use case / application? ("it depends")

E.g., @knz mentioned using AS OF SYSTEM TIME in the discussion on #3743, but as another person pointed out, that may not be acceptable in every case.

In other words, is there some way I should update the example SQL? Or should we just add a note saying "be aware of this, and mitigate it"?


v19.1/selection-queries.md, line 370 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

we could include something about how you would get the first page of results when you don't know what the minimum value of the key is. that is, either select min(key) from table or use a known min value for the data type in question.

Fixed by adding a note with that information (across all 3 versions in this PR: 19.1, 19.2, 20.1)


v19.1/selection-queries.md, line 472 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…
O(1)

shouldn't this be O(log(n))? my knowledge of index implementation could be rusty

I think I will remove this altogether. I think I was going for something more conceptual and it may have been a mistake to introduce notation that implies this level of precision.


v19.1/selection-queries.md, line 491 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…
ordered

i think "ordered" might not be the right word here. perhaps "sequential"

Fixed by changing to "sequential".

@rmloveland
Copy link
Contributor Author

Thanks for the review @rafiss - I addressed your feedback in every case but one, where I had another question.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


v19.1/selection-queries.md, line 338 at r1 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

That makes sense.

Is there a "Right Way" to mitigate this issue? Or is it specific to your use case / application? ("it depends")

E.g., @knz mentioned using AS OF SYSTEM TIME in the discussion on #3743, but as another person pointed out, that may not be acceptable in every case.

In other words, is there some way I should update the example SQL? Or should we just add a note saying "be aware of this, and mitigate it"?

So I think this conversation should be grounded properly by refering to what users usually expect, that is SQL cursors.
CockroachDB does not support these but they are useful to set expectations right.

A cursor is not much more than a long-standing transaction where the client can request more data from the same query through multiple (separate) statements.

Importantly, cursors operate over a snapshot of the database at the moment the cursor is opened (it's possible to open non-consistent cursors but it's rarely used and also not always supported).

So folk used to cursor already expect that pagination operates at a snapshot of the database established when the query starts.

Bringing this back to CockroachDB, this means that it is a-OK to provide users with a feature that "anchors" the paginated read at some point in time, possibly slightly in the past (as long as it's not further in the past than the most recent statement in the same session).

My opinion remains that proper pagination should be done using AOST with a follower read timestamp: this guarantees there won't be concurrent updates (and thus no risk of txn retries during the pagination), and also enables reads to access other replicas than the leaseholder. If necessary we can provide a feature to make such a paginated statement automatically wait for a small delay before starting the read, to ensure it will capture all the latest writes by the same session (or direct users to use pg_sleep() before their AOST query to achieve the same manually).

@rmloveland
Copy link
Contributor Author

rmloveland commented Dec 16, 2019 via email

@rafiss
Copy link
Contributor

rafiss commented Dec 16, 2019

I agree about recommending AS OF SYSTEM TIME. It also would not hurt to include a note that without AS OF SYSTEM TIME, the application should be aware that there could be missing/duplicated data during pagination.

Copy link
Contributor

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)

@rmloveland
Copy link
Contributor Author

@rafiss thanks again for your review and comments. I think the stuff you mentioned is in the latest push.

@knz since you shared some specific opinions, would you mind taking a look at the updates I just pushed? here is a direct link to the patch.

I tried to capture the gist of what you said without bringing follower reads into it since those are an enterprise feature - went with '-1m' instead, since if I'm reading it right the current timestamp returned by experimental_follower_read_timestamp() is about 48s.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. you may want to upsell Enterprise to the doc reader by mentioning follower reads as an alternative to a fixed interval.

@jseldess
Copy link
Contributor

jseldess commented Jan 5, 2020

@rmloveland, looks like you can merge now?

Summary of changes:

- Explain difference between keyset pagination and LIMIT/OFFSET
- Show examples of the former being fast and the latter being slow
- Show how to use EXPLAIN to check why the difference exists
- Add warning to LIMIT/OFFSET docs recommending keyset pagination
- ... all of the above for 19.1, 19.2, 20.1 docs

Fixes #3743.
@rmloveland rmloveland force-pushed the 20191203-keyset-pagination branch from b7c31ee to 5cf73c6 Compare January 9, 2020 16:05
@rmloveland rmloveland merged commit f83d2af into master Jan 9, 2020
@rmloveland rmloveland deleted the 20191203-keyset-pagination branch January 9, 2020 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make client-side pagination a true recommendation
5 participants