Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove SortKeyCursor (~5-10% faster merge) #5895

Merged
merged 3 commits into from
Apr 7, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Apr 6, 2023

Which issue does this PR close?

Part of #5882
Relates to #5879

Rationale for this change

Replaces SortKeyCursor with a simplified, crate-private version. I couldn't see an obvious reason why SortKeyCursor was public other than to enable integration tests. Especially now that arrow-row exists, I don't see a compelling reason to keep it around.

merge sorted i64        time:   [8.7181 ms 8.7273 ms 8.7378 ms]
                        change: [-8.6612% -8.5375% -8.3983%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

sort merge i64          time:   [8.9273 ms 8.9396 ms 8.9521 ms]
                        change: [-8.4047% -8.2575% -8.0995%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

merge sorted f64        time:   [8.6883 ms 8.7002 ms 8.7140 ms]
                        change: [-8.4371% -8.2785% -8.1206%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

sort merge f64          time:   [8.9428 ms 8.9550 ms 8.9680 ms]
                        change: [-7.8564% -7.7071% -7.5387%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

merge sorted utf8 low cardinality
                        time:   [6.2272 ms 6.2357 ms 6.2448 ms]
                        change: [-14.595% -14.461% -14.314%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

sort merge utf8 low cardinality
                        time:   [6.8843 ms 6.9109 ms 6.9380 ms]
                        change: [-13.368% -12.957% -12.521%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted utf8 high cardinality
                        time:   [10.147 ms 10.224 ms 10.307 ms]
                        change: [-6.2273% -5.3649% -4.5217%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  3 (3.00%) high mild
  18 (18.00%) high severe

sort merge utf8 high cardinality
                        time:   [11.191 ms 11.254 ms 11.321 ms]
                        change: [-7.4775% -6.7869% -6.0794%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

merge sorted utf8 tuple time:   [17.150 ms 17.169 ms 17.188 ms]
                        change: [-4.9041% -4.7286% -4.5686%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

sort merge utf8 tuple   time:   [20.286 ms 20.399 ms 20.511 ms]
                        change: [-5.5443% -4.8463% -4.1378%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted utf8 dictionary
                        time:   [5.6368 ms 5.6413 ms 5.6461 ms]
                        change: [-14.343% -14.257% -14.164%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

sort merge utf8 dictionary
                        time:   [5.7874 ms 5.7961 ms 5.8055 ms]
                        change: [-14.409% -14.234% -14.047%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

merge sorted utf8 dictionary tuple
                        time:   [8.3309 ms 8.3364 ms 8.3423 ms]
                        change: [-10.213% -10.122% -10.033%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sort merge utf8 dictionary tuple
                        time:   [9.6855 ms 9.7650 ms 9.8424 ms]
                        change: [-8.8845% -7.8176% -6.7832%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted mixed dictionary tuple
                        time:   [14.514 ms 14.533 ms 14.553 ms]
                        change: [-5.1546% -5.0148% -4.8671%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sort merge mixed dictionary tuple
                        time:   [15.994 ms 16.073 ms 16.151 ms]
                        change: [-4.7746% -4.1442% -3.4971%] (p = 0.00 < 0.05)
                        Performance has improved.

merge sorted mixed tuple
                        time:   [16.649 ms 16.675 ms 16.702 ms]
                        change: [-4.6957% -4.5328% -4.3548%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

sort merge mixed tuple  time:   [18.427 ms 18.477 ms 18.527 ms]
                        change: [-5.4182% -5.0185% -4.6040%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Yes SortKeyCursor is removed, it is highly unlikely anyone was using this as it is very coupled with SortPreservingMerge

@tustvold tustvold added the api change Changes the API exposed to users of the crate label Apr 6, 2023
@github-actions github-actions bot added the core Core DataFusion crate label Apr 6, 2023
@alamb alamb mentioned this pull request Apr 6, 2023
6 tasks
@tustvold tustvold force-pushed the simplify-sort-key-cursor branch from 7a8b225 to 343b82d Compare April 7, 2023 09:34
@tustvold tustvold marked this pull request as ready for review April 7, 2023 09:35
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @tustvold

cc @yjshen and @richox who may have some history with this code

@tustvold tustvold merged commit cb8a170 into apache:main Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants