Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup for single vector sort/bitonic merge (and minor cleanup for argsort/argselect) #152

Merged
merged 6 commits into from
Oct 22, 2024

Conversation

sterrettm2
Copy link
Contributor

@sterrettm2 sterrettm2 commented May 16, 2024

This patch rewrites all of the single vector sorting and bitonic merging to use swizzle ops and generic masks to reduce code duplication. It also centralizes all of this logic into one file. Also I did a small cleanup to the argsort code

This should have no impact on performance at all, and my testing seems to confirm this:

AVX512 1 million performance
Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_1m vs. simdsort/random_1m]/uint64_t                -0.0000         -0.0000       7758757       7758751       7758474       7758413
[simdsort/random_1m vs. simdsort/random_1m]/int64_t                 +0.0044         +0.0043       7753754       7787564       7753621       7786777
[simdsort/random_1m vs. simdsort/random_1m]/uint32_t                +0.0031         +0.0031       3601040       3612155       3600724       3611744
[simdsort/random_1m vs. simdsort/random_1m]/int32_t                 -0.0007         -0.0007       3615923       3613349       3615524       3613013
[simdsort/random_1m vs. simdsort/random_1m]/uint16_t                +0.0022         +0.0021     121162973     121427061     121160994     121420633
[simdsort/random_1m vs. simdsort/random_1m]/int16_t                 -0.0024         -0.0023     121280015     120989927     121272200     120988256
[simdsort/random_1m vs. simdsort/random_1m]/float                   +0.0039         +0.0038       3482392       3495850       3482070       3495381
[simdsort/random_1m vs. simdsort/random_1m]/double                  -0.0048         -0.0048       6564316       6533029       6564143       6532438
OVERALL_GEOMEAN                                                     +0.0007         +0.0007             0             0             0             0
AVX512 1k performance
Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_1k vs. simdsort/random_1k]/uint64_t                +0.0034         +0.0033          4372          4387          4377          4392
[simdsort/random_1k vs. simdsort/random_1k]/int64_t                 -0.0019         -0.0016          4373          4365          4375          4368
[simdsort/random_1k vs. simdsort/random_1k]/uint32_t                +0.0022         +0.0010          2459          2465          2469          2471
[simdsort/random_1k vs. simdsort/random_1k]/int32_t                 +0.0066         +0.0075          2453          2469          2463          2482
[simdsort/random_1k vs. simdsort/random_1k]/uint16_t                -0.0042         -0.0042         64846         64577         64860         64588
[simdsort/random_1k vs. simdsort/random_1k]/int16_t                 -0.0009         -0.0010         65761         65699         65786         65718
[simdsort/random_1k vs. simdsort/random_1k]/float                   +0.0099         +0.0099          2295          2318          2305          2328
[simdsort/random_1k vs. simdsort/random_1k]/double                  +0.0020         +0.0011          3792          3800          3800          3805
OVERALL_GEOMEAN                                                     +0.0021         +0.0020             0             0             0             0                                                    +0.0014         +0.0015             0             0             0             0
AVX512 16 value performance
Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_16 vs. simdsort/random_16]/uint64_t                -0.0004         -0.0002           859           859           862           862
[simdsort/random_16 vs. simdsort/random_16]/int64_t                 +0.0008         +0.0007           861           862           864           865
[simdsort/random_16 vs. simdsort/random_16]/uint32_t                +0.0004         +0.0005           850           850           853           853
[simdsort/random_16 vs. simdsort/random_16]/int32_t                 -0.0002         -0.0002           851           851           854           854
[simdsort/random_16 vs. simdsort/random_16]/uint16_t                +0.0000         -0.0000          1124          1124          1126          1126
[simdsort/random_16 vs. simdsort/random_16]/int16_t                 +0.0024         +0.0023          1133          1135          1135          1138
[simdsort/random_16 vs. simdsort/random_16]/float                   +0.0062         +0.0067           868           873           870           876
[simdsort/random_16 vs. simdsort/random_16]/double                  +0.0018         +0.0019           867           868           869           871
OVERALL_GEOMEAN                                                     +0.0014         +0.0015             0             0             0             0

I also simply deduplicated the logic for argsort/argselect, and cleaned up some naming there

Note that currently the test suite does not test the single vector sorting logic for qsort. The key-value versions are effectively tested by the kvsort tests, but nothing ends up testing the non key-value versions.

Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. nice little clean up! Also confirm no performance regression on ICX. Thanks @sterrettm2

@r-devulap r-devulap merged commit 990ae6c into intel:main Oct 22, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants