-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL pagination #1067
Comments
Potential reference point: https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/ |
Comment from @danhhz, from #657: The usual way to do this is one SELECT query per page that uses LIMIT/OFFSET, but since it doesn't make sense to run this in a sql transaction, any insertions/deletions could make this an inconsistent view of the table. CockroachDB has AS OF SYSTEM TIME implemented, which means we can let the user iterate over a consistent version of their database without a transaction. This would work by selecting See the discussion in cockroachdb/cockroach#9227 (comment) |
@jseldess That might sound like a good idea, but apart from dumping scenarios it mostly is not. |
The most practical thing would be to allow instead of an offset telling it to show everything after a specified id within the current sorting. |
Since there is nothing but time as a serial factor here, the tables affected should not allow inserting anything with the same timestamp or anything in that window at a later time. |
A very nasty pagination way would be SELECT * FROM tableName ORDER BY "createdDate" DESC LIMIT 5 OFFSET (
SELECT rn FROM (
SELECT id, row_number() over (order by "createdDate" DESC) AS rn FROM tableName
ORDER BY "createdDate" DESC
)
WHERE id = 'd25b7651-b553-4857-9380-41027342b74e'
); Which is not by far a fast way. |
Hi, Why not just: CREATE INDEX ON tableName(created_at DESC, id DESC);
SELECT * FROM tableName WHERE createdDate <= 'latest returned date' AND id < 'latest returned id' ORDER BY createdDate DESC, id DESC LIMIT 5; I haven't tested it much, but it seems to be using the index just fine. Also notice the use of |
if you have UUIDs as Ids that is already broken and you're back again to offset pagination which is unsafe for any scenario where your application is actually being used and data may change while paginating. |
Even with UUIDs as IDs this would work, as you order on date first, right? For offset pagination (assuming you aren't going for deep offsets) |
No not quite, the date sorting works but you can't apply your id sort anymore. Your However, pagination with offsets is bad for a multitude of reasons anyway. But hey, srsly, enough people written about that already, so I wont recall all the reasons why offset pagination is bad, but just link some random articles: |
I'm not recommending OFFSET pagination if you are gonna use a big offset, as said in my comment. Why would |
Actually you are right, with UUIDs the query I put would only work for those with tied insertion time, not as a general pagination, my bad. |
This will work with UUIDs and a TIMESTAMP column in a performant way: CREATE INDEX ON tableName(datetime DESC, id DESC) STORING (additional columns for the select);
SELECT * FROM t1 WHERE (datetime,id) < ('last returned datetime', 'last returned uuid') ORDER BY time DESC, id DESC limit 5; Is this a good generic solution for your case? |
No, still wont work. |
The only valid operation on an |
Right, if you are inserting data "back in time". Assuming new insertions are at "now", this should work. Still not general for all cases. |
Semantically yeah, but we don't really care that the ordering makes sense, just that it's consistent, so either lexicographic or byte-based ordering would do for pagination with this query (assuming that new insertions are "now" so any client already paginating to older results don't get affected by those new insertions). |
Closing this in favor of a separate issue to recommend client-side pagination in most cases: #3743 |
From @bdarnell, overheard in a thread with a user:
limit/offset is a really inefficient way of doing pagination. unless you really need to be able to jump to arbitrary pages, it's better to have each key include the index of the last entry returned (or the first entry not returned) so you can pick up the query from there on the next request
if you do need to use limit/offset, then it's often better to split it into two queries: one query that uses limit/offset and just reads the task_id (so it can use the index alone without joining with the primary table), and a second query that uses
where task_id in (...)
to get the data (assuming task_id is the primary key)Also:
limit/offset pagination gets a lot more efficient if your index includes enough columns to satisfy the group by/order by clauses without joining against the primary data
And then from @RaduBerinde:
yes, if the index is sorted like the order by clause, we won't need to sort, which means we will just read (limit + offset) rows. This applies even if we are doing an index-join because other columns are needed by the select.
The text was updated successfully, but these errors were encountered: