SQL pagination #1067

jseldess · 2017-02-03T14:54:41Z

From @bdarnell, overheard in a thread with a user:

limit/offset is a really inefficient way of doing pagination. unless you really need to be able to jump to arbitrary pages, it's better to have each key include the index of the last entry returned (or the first entry not returned) so you can pick up the query from there on the next request

if you do need to use limit/offset, then it's often better to split it into two queries: one query that uses limit/offset and just reads the task_id (so it can use the index alone without joining with the primary table), and a second query that uses where task_id in (...) to get the data (assuming task_id is the primary key)

Also:

limit/offset pagination gets a lot more efficient if your index includes enough columns to satisfy the group by/order by clauses without joining against the primary data

And then from @RaduBerinde:

yes, if the index is sorted like the order by clause, we won't need to sort, which means we will just read (limit + offset) rows. This applies even if we are doing an index-join because other columns are needed by the select.

The text was updated successfully, but these errors were encountered:

sploiselle · 2017-02-09T21:50:26Z

Potential reference point: https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/

jseldess · 2018-03-28T18:28:45Z

Comment from @danhhz, from #657:

The usual way to do this is one SELECT query per page that uses LIMIT/OFFSET, but since it doesn't make sense to run this in a sql transaction, any insertions/deletions could make this an inconsistent view of the table.

CockroachDB has AS OF SYSTEM TIME implemented, which means we can let the user iterate over a consistent version of their database without a transaction. This would work by selecting cluster_logical_timestamp() in the first SELECT, passing it through the page (like limit and offset are) and using it in subsequent queries. @mjibson has already implemented paging exactly like this for the ./cockroach dump command.

See the discussion in cockroachdb/cockroach#9227 (comment)

wzrdtales · 2018-03-31T17:30:49Z

@jseldess That might sound like a good idea, but apart from dumping scenarios it mostly is not. AS OF SYSTEM TIME will also deliver outdated results, so for a pagintion you really only want to avoid strange pagination behaviors, but you do not want to deliver outdated results.

wzrdtales · 2018-03-31T17:32:25Z

The most practical thing would be to allow instead of an offset telling it to show everything after a specified id within the current sorting.

wzrdtales · 2018-03-31T17:36:06Z

Since there is nothing but time as a serial factor here, the tables affected should not allow inserting anything with the same timestamp or anything in that window at a later time.

wzrdtales · 2018-03-31T18:39:30Z

A very nasty pagination way would be

SELECT * FROM tableName ORDER BY "createdDate" DESC LIMIT 5 OFFSET (
  SELECT rn FROM (
    SELECT id, row_number() over (order by "createdDate" DESC) AS rn FROM tableName 
    ORDER BY "createdDate" DESC
  ) 
  WHERE id = 'd25b7651-b553-4857-9380-41027342b74e'
);

Which is not by far a fast way.

antoniomo · 2018-07-18T14:03:47Z

Hi,

Why not just:

CREATE INDEX ON tableName(created_at DESC, id DESC);

SELECT * FROM tableName WHERE createdDate <= 'latest returned date' AND id < 'latest returned id' ORDER BY createdDate DESC, id DESC LIMIT 5;

I haven't tested it much, but it seems to be using the index just fine. Also notice the use of <= on the date column to make sure we don't skip values if we accept several rows with the same creation date.

wzrdtales · 2018-07-18T16:31:19Z

if you have UUIDs as Ids that is already broken and you're back again to offset pagination which is unsafe for any scenario where your application is actually being used and data may change while paginating.

antoniomo · 2018-07-18T17:43:56Z

Even with UUIDs as IDs this would work, as you order on date first, right? For offset pagination (assuming you aren't going for deep offsets) AS OF SYSTEM TIME is quite reasonable, taking the current_timestamp() of the first page when the client requested it.

wzrdtales · 2018-07-18T21:38:35Z

No not quite, the date sorting works but you can't apply your id sort anymore. Your AND id < doesn't work with UUIDs. AS OF SYSTEM TIME is not reasonable at all actually, since you freeze the systems state, this is not applicable for most applications out there. Unless it doesn't matter for your customers that they see plain old stuff, just b/c they started paginating. Sorting the timestamp is only actually usable if the times don't overlap. But most of the time it doesn't really matter that much if they do so this can be most of the time ignored.

However, pagination with offsets is bad for a multitude of reasons anyway.

But hey, srsly, enough people written about that already, so I wont recall all the reasons why offset pagination is bad, but just link some random articles:
https://blog.jooq.org/2016/08/10/why-most-programmers-get-pagination-wrong/
https://coderwall.com/p/lkcaag/pagination-you-re-probably-doing-it-wrong

antoniomo · 2018-07-19T07:20:47Z

I'm not recommending OFFSET pagination if you are gonna use a big offset, as said in my comment. Why would AND id < not work with UUIDs if you have an index sorting them in a consistent way? It's only to break ties consistently on items inserted at the exact same time, nothing more, the UUID order doesn't matter as long as it's consistent across queries.

antoniomo · 2018-07-19T08:05:18Z

Actually you are right, with UUIDs the query I put would only work for those with tied insertion time, not as a general pagination, my bad.

antoniomo · 2018-07-19T08:27:02Z

This will work with UUIDs and a TIMESTAMP column in a performant way:

CREATE INDEX ON tableName(datetime DESC, id DESC) STORING (additional columns for the select);

SELECT * FROM t1 WHERE (datetime,id) < ('last returned datetime', 'last returned uuid') ORDER BY time DESC, id DESC limit 5;

Is this a good generic solution for your case?

wzrdtales · 2018-07-19T08:29:04Z

No, still wont work. UUIDs do not have a consistent behaviour, they're random. So your comparison operation, for smaller or bigger, will have an inconsistent behaviour that can change when data gets inserted.

wzrdtales · 2018-07-19T08:30:14Z

The only valid operation on an UUID to compare it with another one would be (not) equal actually.

antoniomo · 2018-07-19T08:38:10Z

Right, if you are inserting data "back in time". Assuming new insertions are at "now", this should work. Still not general for all cases.

antoniomo · 2018-07-19T08:40:22Z

The only valid operation on an UUID to compare it with another one would be (not) equal actually.

Semantically yeah, but we don't really care that the ordering makes sense, just that it's consistent, so either lexicographic or byte-based ordering would do for pagination with this query (assuming that new insertions are "now" so any client already paginating to older results don't get affected by those new insertions).

jseldess · 2018-09-18T13:29:03Z

Closing this in favor of a separate issue to recommend client-side pagination in most cases: #3743

jseldess added the T-enhancement label Feb 3, 2017

sploiselle mentioned this issue Oct 25, 2017

Added content for FETCH FIRST and FETCH NEXT #2099

Merged

jseldess added T-missing-info and removed T-enhancement labels Mar 28, 2018

jseldess added this to the 2.1 milestone Mar 28, 2018

jseldess mentioned this issue Mar 28, 2018

sql: Document how to consistently page through results #657

Closed

jseldess closed this as completed Sep 18, 2018

wzrdtales mentioned this issue Sep 18, 2018

Make client-side pagination a true recommendation #3743

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL pagination #1067

SQL pagination #1067

jseldess commented Feb 3, 2017 •

edited

Loading

sploiselle commented Feb 9, 2017

jseldess commented Mar 28, 2018

wzrdtales commented Mar 31, 2018

wzrdtales commented Mar 31, 2018

wzrdtales commented Mar 31, 2018 •

edited

Loading

wzrdtales commented Mar 31, 2018 •

edited

Loading

antoniomo commented Jul 18, 2018 •

edited

Loading

wzrdtales commented Jul 18, 2018 •

edited

Loading

antoniomo commented Jul 18, 2018

wzrdtales commented Jul 18, 2018

antoniomo commented Jul 19, 2018 •

edited

Loading

antoniomo commented Jul 19, 2018

antoniomo commented Jul 19, 2018

wzrdtales commented Jul 19, 2018

wzrdtales commented Jul 19, 2018 •

edited

Loading

antoniomo commented Jul 19, 2018

antoniomo commented Jul 19, 2018

jseldess commented Sep 18, 2018

SQL pagination #1067

SQL pagination #1067

Comments

jseldess commented Feb 3, 2017 • edited Loading

sploiselle commented Feb 9, 2017

jseldess commented Mar 28, 2018

wzrdtales commented Mar 31, 2018

wzrdtales commented Mar 31, 2018

wzrdtales commented Mar 31, 2018 • edited Loading

wzrdtales commented Mar 31, 2018 • edited Loading

antoniomo commented Jul 18, 2018 • edited Loading

wzrdtales commented Jul 18, 2018 • edited Loading

antoniomo commented Jul 18, 2018

wzrdtales commented Jul 18, 2018

antoniomo commented Jul 19, 2018 • edited Loading

antoniomo commented Jul 19, 2018

antoniomo commented Jul 19, 2018

wzrdtales commented Jul 19, 2018

wzrdtales commented Jul 19, 2018 • edited Loading

antoniomo commented Jul 19, 2018

antoniomo commented Jul 19, 2018

jseldess commented Sep 18, 2018

jseldess commented Feb 3, 2017 •

edited

Loading

wzrdtales commented Mar 31, 2018 •

edited

Loading

wzrdtales commented Mar 31, 2018 •

edited

Loading

antoniomo commented Jul 18, 2018 •

edited

Loading

wzrdtales commented Jul 18, 2018 •

edited

Loading

antoniomo commented Jul 19, 2018 •

edited

Loading

wzrdtales commented Jul 19, 2018 •

edited

Loading