-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pagination docs to use keyset / seek method #6114
Conversation
21f7abb
to
57ffcc4
Compare
@jordanlewis and @rafiss Since I think this pagination content is "app-dev" related: Who's the best engineer to ask for a review? Lots of opinions on the linked issue's thread. |
I can do a review of this (at least from an app dev perspective). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rmloveland)
v19.1/selection-queries.md, line 338 at r1 (raw file):
~~~ ### Paginate through limited results
i wonder if our docs should include a caveat somewhere that one should be careful when paginating over a set of records that may change during the process of pagination. depending on the use case, this might be OK, but if additional rows are being added while the app is trying to paginate through everything, both of the methods we talk about here allow the possibility of skipping over some records or seeing duplicate records.
v19.1/selection-queries.md, line 370 at r1 (raw file):
~~~ To get the first page of results using keyset pagination, run:
we could include something about how you would get the first page of results when you don't know what the minimum value of the key is. that is, either select min(key) from table
or use a known min value for the data type in question.
v19.1/selection-queries.md, line 472 at r1 (raw file):
O(1)
shouldn't this be O(log(n))
? my knowledge of index implementation could be rusty
v19.1/selection-queries.md, line 491 at r1 (raw file):
ordered
i think "ordered" might not be the right word here. perhaps "sequential"
57ffcc4
to
003e882
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @rafiss)
v19.1/selection-queries.md, line 338 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i wonder if our docs should include a caveat somewhere that one should be careful when paginating over a set of records that may change during the process of pagination. depending on the use case, this might be OK, but if additional rows are being added while the app is trying to paginate through everything, both of the methods we talk about here allow the possibility of skipping over some records or seeing duplicate records.
That makes sense.
Is there a "Right Way" to mitigate this issue? Or is it specific to your use case / application? ("it depends")
E.g., @knz mentioned using AS OF SYSTEM TIME in the discussion on #3743, but as another person pointed out, that may not be acceptable in every case.
In other words, is there some way I should update the example SQL? Or should we just add a note saying "be aware of this, and mitigate it"?
v19.1/selection-queries.md, line 370 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
we could include something about how you would get the first page of results when you don't know what the minimum value of the key is. that is, either
select min(key) from table
or use a known min value for the data type in question.
Fixed by adding a note with that information (across all 3 versions in this PR: 19.1, 19.2, 20.1)
v19.1/selection-queries.md, line 472 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
O(1)
shouldn't this be
O(log(n))
? my knowledge of index implementation could be rusty
I think I will remove this altogether. I think I was going for something more conceptual and it may have been a mistake to introduce notation that implies this level of precision.
v19.1/selection-queries.md, line 491 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
ordered
i think "ordered" might not be the right word here. perhaps "sequential"
Fixed by changing to "sequential".
Thanks for the review @rafiss - I addressed your feedback in every case but one, where I had another question. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
v19.1/selection-queries.md, line 338 at r1 (raw file):
Previously, rmloveland (Rich Loveland) wrote…
That makes sense.
Is there a "Right Way" to mitigate this issue? Or is it specific to your use case / application? ("it depends")
E.g., @knz mentioned using AS OF SYSTEM TIME in the discussion on #3743, but as another person pointed out, that may not be acceptable in every case.
In other words, is there some way I should update the example SQL? Or should we just add a note saying "be aware of this, and mitigate it"?
So I think this conversation should be grounded properly by refering to what users usually expect, that is SQL cursors.
CockroachDB does not support these but they are useful to set expectations right.
A cursor is not much more than a long-standing transaction where the client can request more data from the same query through multiple (separate) statements.
Importantly, cursors operate over a snapshot of the database at the moment the cursor is opened (it's possible to open non-consistent cursors but it's rarely used and also not always supported).
So folk used to cursor already expect that pagination operates at a snapshot of the database established when the query starts.
Bringing this back to CockroachDB, this means that it is a-OK to provide users with a feature that "anchors" the paginated read at some point in time, possibly slightly in the past (as long as it's not further in the past than the most recent statement in the same session).
My opinion remains that proper pagination should be done using AOST with a follower read timestamp: this guarantees there won't be concurrent updates (and thus no risk of txn retries during the pagination), and also enables reads to access other replicas than the leaseholder. If necessary we can provide a feature to make such a paginated statement automatically wait for a small delay before starting the read, to ensure it will capture all the latest writes by the same session (or direct users to use pg_sleep() before their AOST query to achieve the same manually).
kena <notifications@github.com> writes:
So I think this conversation should be grounded properly by refering
to what users usually expect, that is SQL cursors.
CockroachDB does not support these but they are useful to set
expectations right.
A cursor is not much more than a long-standing transaction where the
client can request more data from the same query through multiple
(separate) statements.
Importantly, cursors operate over a snapshot of the database at the
moment the cursor is opened (it's possible to open non-consistent
cursors but it's rarely used and also not always supported).
So folk used to cursor already expect that pagination operates at a
snapshot of the database established when the query starts.
Bringing this back to CockroachDB, this means that it is a-OK to
provide users with a feature that "anchors" the paginated read at some
point in time, possibly slightly in the past (as long as it's not
further in the past than the most recent statement in the same
session).
My opinion remains that proper pagination should be done using AOST
with a follower read timestamp: this guarantees there won't be
concurrent updates (and thus no risk of txn retries during the
pagination), and also enables reads to access other replicas than the
leaseholder. If necessary we can provide a feature to make such a
paginated statement automatically wait for a small delay before
starting the read, to ensure it will capture all the latest writes by
the same session (or direct users to use pg_sleep() before their AOST
query to achieve the same manually).
Thanks for providing this context Raphael. I'll update this PR, then,
to show how to paginate with AOST and a follower read timestamp (or
~equivalent, since that's an enterprise feature) to mimic cursor-like
functionality.
|
I agree about recommending |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
003e882
to
b7c31ee
Compare
@rafiss thanks again for your review and comments. I think the stuff you mentioned is in the latest push. @knz since you shared some specific opinions, would you mind taking a look at the updates I just pushed? here is a direct link to the patch. I tried to capture the gist of what you said without bringing follower reads into it since those are an enterprise feature - went with '-1m' instead, since if I'm reading it right the current timestamp returned by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. you may want to upsell Enterprise to the doc reader by mentioning follower reads as an alternative to a fixed interval.
@rmloveland, looks like you can merge now? |
Summary of changes: - Explain difference between keyset pagination and LIMIT/OFFSET - Show examples of the former being fast and the latter being slow - Show how to use EXPLAIN to check why the difference exists - Add warning to LIMIT/OFFSET docs recommending keyset pagination - ... all of the above for 19.1, 19.2, 20.1 docs Fixes #3743.
b7c31ee
to
5cf73c6
Compare
Summary of changes:
Fixes #3743