[8.x] Mixed orders in cursor paginate #37762

halaei · 2021-06-21T13:15:35Z

It is possible to remove the following limitation from cursorPaginate() function:

It requires that the "order by" directions (descending / ascending) are the same if there are multiple "order by" clauses.

Example:

$cursor = ['test' => 'foo', 'another' => 'bar', 'third' => 'baz'];
$query->...->orderBy('test')->orderByDesc('another')->orderBy('third');

A desired SQL statement:

SELECT * FROM table
WHERE
    (test > 'foo' or
       (test = 'foo' and (another < 'bar' or
           (another = 'bar' and (third > 'baz')))))
ORDER BY test asc, another desc, third asc
LIMIT 100

Todo

Remove the limitation from docs.
Remove the deprecated CursorPaginationException in 9.x.
Check if DBMS can handle queries generated by this PR as efficiently as queries generated by the current Laravel release.
~~Remove the redundant parentheses generated by this PR~~.
Add more tests to check it handles cursors with more that 2 variables and mixed directions.

Tests

I tried to fix tests by checking the resulting query and bindings every time the mocked get() function is called.

driesvints · 2021-06-21T13:31:05Z

Ping @paras-malhotra

paras-malhotra · 2021-06-24T15:57:23Z

@halaei, the reason I didn't take this path is because of the changing behaviour of index usage based on the DB engine. I guess if we construct queries like this, we'll need to check if all 4 databases are able to use: (a) a composite index and (b) an index on the first column if a composite index is missing, on the queries constructed. The benefit of the tuple comparison operator is that the DB engine takes care of this automatically.

I haven't looked into index support for database engines but you may find this article useful: https://use-the-index-luke.com/sql/partial-results/fetch-next-page

halaei · 2021-06-24T18:41:54Z

I am not sure about database support either. Ideally DB engines should be able to find and use the best index in the most efficient way without any hint, but I don't know if they are all as perfect as we need them to be. Regardless of this PR, I would really like to know how DB engines work in this regard.
If there is a performance issue for queries in this PR, beside reporting the issue to DB maintainers :) maybe we can use the simpler tuple comparison clause if possible and fallback to queries generated by this PR if it isn't possible.

One important consideration is that according to sql-workbenck, tuple comparison with < or > is not supported in SQL-Server and Oracle. So this PR removes another limitation for SQL-Server, which is officially supported by Laravel.
Moreover, since this PR doesn't use raw queries, it should probably work for jenssegers/laravel-mongodb too, which is personally important to me.

paras-malhotra · 2021-06-24T19:17:22Z

I'd recommend we stick to the tuple comparison to avoid extra complexity on constructing queries for index usage. If we go with this PR, we should test it or check out the RDBMS docs to ensure the index is actually used. The whole idea of cursor pagination is for better query performance, and if it messes up the execution plan, in the end it may not be worth it from a performance perspective when compared to offset pagination.

For the issue with the incompatibility with SQL Server, I think we can just mention that in the docs. This way, at least we're sure that cursor pagination does in fact increase query performance for the databases that are supported.

halaei · 2021-06-25T11:33:41Z

According to MySQL optimization docs, there are scenarios where using row constructor, e.g. tuples, actually degrade the performance. So maybe not using tuple comparison is actually a performance optimization! I try to check it with some benchmark.

halaei · 2021-06-25T13:03:22Z

My benchmark result for MySQL 8.0.25:

Table

CREATE TABLE `foos` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `foo` int NOT NULL,
  `bar` int NOT NULL,
  `baz` int NOT NULL,
  `created_at` timestamp NULL DEFAULT NULL,
  `updated_at` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `foos_foo_bar_baz_index` (`foo`,`bar`,`baz`)
) ENGINE=InnoDB AUTO_INCREMENT=2000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Data

The table is seeded with 2 million records, with 0<=foo<10, 0<=bar<10, 0<=baz<20000. This distribution noticibly slows down queries that can't use the entire 12-byte length of the multi-column index.

for ($foo = 0; $foo < 10; $foo++) {
    for ($bar = 0; $bar < 10; $bar++) {
        for ($i = 0; $i < 20000; $i += 100) {
            $values = [];
            $date = now();
            for ($baz = $i; $baz < $i + 100; $baz++) {
                $values[] = [
                    'foo' => $foo,
                    'bar' => $bar,
                    'baz' => $baz,
                    'created_at' => $date,
                    'updated_at' => $date,
                ];
            }
            \App\Models\Foo::query()->insert($values);
        }
    }
}

Queries

In the following queries we try to get 50 elements from the near end of the index, to make sure they will be slow unless they perfectly use the multi-column index.

1. Paginating all items:

1.1. Using single column comparison:

select * from `foos` where (`foo` > 9 or (`foo` = 9 and (`bar` > 9 or (`bar` = 9 and (`baz` > 19000))))) order by `foo` asc, `bar` asc, `baz` asc limit 51;

Runtime: less than 2 milliseconds.
key_len reported by explain select statement: 12.

1.2. Using tuple comparison:

select * from `foos` where (`foo`, `bar`, `baz`) > (9, 9, 19000) order by `foo` asc, `bar` asc, `baz` asc limit 51

Runtime: 16 seconds.
key_len reported by explain select statement: 12.
This result must be a bug from MySQL. It is multiple times slower than scanning the whole table, e.g. by select count(*).

2. Paginating items with `foo = 9`:

2.1. Using single column comparison:

select * from `foos` where `foo` = 9 and (`bar` > 9 or (`bar` = 9 and (`baz` > 19000))) order by `bar` asc, `baz` asc limit 51

Runtime: less than 2 milliseconds.
key_len reported by explain select statement: 12.

2.2. Using tuple comparison:

select * from `foos` where `foo` = 9 and (`bar`, `baz`) > (9, 19000) order by `bar` asc, `baz` asc limit 51

Runtime: 0.5 seconds.
key_len reported by explain select statement: 4.
This result agrees with MySQL documentation: https://dev.mysql.com/doc/refman/8.0/en/row-constructor-optimization.html

Conclusion

With MySQL, it is better to use single column comparison for cursor pagination of rows sorted by a 3-column index. Using tuple comparison is slower than simple pagination for some cases. This is most probably caused by a bug in MySQL and InnoDB engine.

taylorotwell · 2021-06-28T14:21:27Z

@paras-malhotra any other thoughts on this?

paras-malhotra · 2021-06-28T14:29:22Z

Based on the analysis by @halaei and #37216 (comment), it does seem that this PR will improve performance for mySQL but I'm not sure how it compares for PostgreSQL.

I guess, since there are more mySQL users than PostgreSQL and also the fact that SQL Server isn't supported by the current cursor pagination, I think it makes sense to accept this PR. 👍

halaei · 2021-06-28T15:18:29Z

For the record, MySQL has confirmed the performance bug I reported as "a duplicate of an internally-filed bug report, which is not yet scheduled for fixing":
https://bugs.mysql.com/bug.php?id=104128

taylorotwell · 2021-06-29T13:22:56Z

Do we have any integration tests on this feature that actually hit the database?

halaei · 2021-06-29T14:38:20Z

@taylorotwell there are some tests here:

framework/tests/Database/DatabaseEloquentIntegrationTest.php

Lines 324 to 402 in 855a919

    
           public function testCursorPaginatedModelCollectionRetrieval() 
        
           { 
        
               EloquentTestUser::create(['id' => 1, 'email' => 'taylorotwell@gmail.com']); 
        
               EloquentTestUser::create($secondParams = ['id' => 2, 'email' => 'abigailotwell@gmail.com']); 
        
               EloquentTestUser::create(['id' => 3, 'email' => 'foo@gmail.com']); 
        
               CursorPaginator::currentCursorResolver(function () { 
        
                   return null; 
        
               }); 
        
               $models = EloquentTestUser::oldest('id')->cursorPaginate(2); 
        
               $this->assertCount(2, $models); 
        
               $this->assertInstanceOf(CursorPaginator::class, $models); 
        
               $this->assertInstanceOf(EloquentTestUser::class, $models[0]); 
        
               $this->assertInstanceOf(EloquentTestUser::class, $models[1]); 
        
               $this->assertSame('taylorotwell@gmail.com', $models[0]->email); 
        
               $this->assertSame('abigailotwell@gmail.com', $models[1]->email); 
        
               $this->assertTrue($models->hasMorePages()); 
        
               $this->assertTrue($models->hasPages()); 
        
               CursorPaginator::currentCursorResolver(function () use ($secondParams) { 
        
                   return new Cursor($secondParams); 
        
               }); 
        
               $models = EloquentTestUser::oldest('id')->cursorPaginate(2); 
        
               $this->assertCount(1, $models); 
        
               $this->assertInstanceOf(CursorPaginator::class, $models); 
        
               $this->assertInstanceOf(EloquentTestUser::class, $models[0]); 
        
               $this->assertSame('foo@gmail.com', $models[0]->email); 
        
               $this->assertFalse($models->hasMorePages()); 
        
               $this->assertTrue($models->hasPages()); 
        
           } 
        
           public function testPreviousCursorPaginatedModelCollectionRetrieval() 
        
           { 
        
               EloquentTestUser::create(['id' => 1, 'email' => 'taylorotwell@gmail.com']); 
        
               EloquentTestUser::create(['id' => 2, 'email' => 'abigailotwell@gmail.com']); 
        
               EloquentTestUser::create($thirdParams = ['id' => 3, 'email' => 'foo@gmail.com']); 
        
               CursorPaginator::currentCursorResolver(function () use ($thirdParams) { 
        
                   return new Cursor($thirdParams, false); 
        
               }); 
        
               $models = EloquentTestUser::oldest('id')->cursorPaginate(2); 
        
               $this->assertCount(2, $models); 
        
               $this->assertInstanceOf(CursorPaginator::class, $models); 
        
               $this->assertInstanceOf(EloquentTestUser::class, $models[0]); 
        
               $this->assertInstanceOf(EloquentTestUser::class, $models[1]); 
        
               $this->assertSame('taylorotwell@gmail.com', $models[0]->email); 
        
               $this->assertSame('abigailotwell@gmail.com', $models[1]->email); 
        
               $this->assertTrue($models->hasMorePages()); 
        
               $this->assertTrue($models->hasPages()); 
        
           } 
        
           public function testCursorPaginatedModelCollectionRetrievalWhenNoElements() 
        
           { 
        
               CursorPaginator::currentCursorResolver(function () { 
        
                   return null; 
        
               }); 
        
               $models = EloquentTestUser::oldest('id')->cursorPaginate(2); 
        
               $this->assertCount(0, $models); 
        
               $this->assertInstanceOf(CursorPaginator::class, $models); 
        
               Paginator::currentPageResolver(function () { 
        
                   return new Cursor(['id' => 1]); 
        
               }); 
        
               $models = EloquentTestUser::oldest('id')->cursorPaginate(2); 
        
               $this->assertCount(0, $models); 
        
           } 
        
           public function testCursorPaginatedModelCollectionRetrievalWhenNoElementsAndDefaultPerPage() 
        
           { 
        
               $models = EloquentTestUser::oldest('id')->cursorPaginate(); 
        
               $this->assertCount(0, $models); 
        
               $this->assertInstanceOf(CursorPaginator::class, $models); 
        
           }

driesvints · 2021-06-29T14:39:24Z

@halaei can you rebase with 8.x please?

taylorotwell · 2021-07-06T13:54:08Z

Can you mark this as draft until all your checkbox tasks are complete?

halaei · 2021-07-07T20:30:16Z

The only remaining checkbox can only be done after this PR is merged. And I don't think it is really required to "remove the redundant parentheses generated by this PR", but let me know if I you need me to work on it.

halaei force-pushed the mix-orders-for-cursor-paginate-8 branch from 0638ded to cce9fe3 Compare June 21, 2021 13:17

This was referenced Jun 29, 2021

[8.x] Add cursor pagination (aka keyset pagination) #37216

Merged

Better Query Planning lampager/lampager#38

Open

Mixed orders in cursor paginate

8d656a8

halaei force-pushed the mix-orders-for-cursor-paginate-8 branch from cce9fe3 to 8d656a8 Compare June 30, 2021 07:07

driesvints mentioned this pull request Jul 5, 2021

[8.x] Cursor pagination fixes #37915

Merged

driesvints marked this pull request as draft July 6, 2021 13:56

Test cursor paginate with mixed orders

2cb8c68

halaei mentioned this pull request Jul 7, 2021

[8.x] Remove limitation from cursorPaginate() laravel/docs#7177

Merged

halaei marked this pull request as ready for review July 7, 2021 20:26

formatting

1f3f31e

taylorotwell merged commit 80f0112 into laravel:8.x Jul 8, 2021

driesvints mentioned this pull request Jul 13, 2021

[9.x] Remove CursorPaginationException #37995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.x] Mixed orders in cursor paginate #37762

[8.x] Mixed orders in cursor paginate #37762

halaei commented Jun 21, 2021 •

edited

Loading

driesvints commented Jun 21, 2021

paras-malhotra commented Jun 24, 2021 •

edited

Loading

halaei commented Jun 24, 2021

paras-malhotra commented Jun 24, 2021 •

edited

Loading

halaei commented Jun 25, 2021

halaei commented Jun 25, 2021

taylorotwell commented Jun 28, 2021

paras-malhotra commented Jun 28, 2021

halaei commented Jun 28, 2021

taylorotwell commented Jun 29, 2021

halaei commented Jun 29, 2021

driesvints commented Jun 29, 2021

taylorotwell commented Jul 6, 2021

halaei commented Jul 7, 2021

[8.x] Mixed orders in cursor paginate #37762

[8.x] Mixed orders in cursor paginate #37762

Conversation

halaei commented Jun 21, 2021 • edited Loading

Todo

Tests

driesvints commented Jun 21, 2021

paras-malhotra commented Jun 24, 2021 • edited Loading

halaei commented Jun 24, 2021

paras-malhotra commented Jun 24, 2021 • edited Loading

halaei commented Jun 25, 2021

halaei commented Jun 25, 2021

Table

Data

Queries

1. Paginating all items:

1.1. Using single column comparison:

1.2. Using tuple comparison:

2. Paginating items with foo = 9:

2.1. Using single column comparison:

2.2. Using tuple comparison:

Conclusion

taylorotwell commented Jun 28, 2021

paras-malhotra commented Jun 28, 2021

halaei commented Jun 28, 2021

taylorotwell commented Jun 29, 2021

halaei commented Jun 29, 2021

driesvints commented Jun 29, 2021

taylorotwell commented Jul 6, 2021

halaei commented Jul 7, 2021

halaei commented Jun 21, 2021 •

edited

Loading

paras-malhotra commented Jun 24, 2021 •

edited

Loading

paras-malhotra commented Jun 24, 2021 •

edited

Loading

2. Paginating items with `foo = 9`: