Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize lexicographical_compare! #4552

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Apr 1, 2024

Resolves #3998

Using existing mismatch vectorization and existing classifiers.

Benchmark

Before:

bm<uint8_t, op::lexi>/8/3                  2.29 ns         1.08 ns   1000000000
bm<uint8_t, op::lexi>/24/22                2.77 ns         1.48 ns   1000000000
bm<uint8_t, op::lexi>/105/-1               4.29 ns         2.16 ns   1000000000
bm<uint8_t, op::lexi>/4021/3056            73.6 ns         35.7 ns     50779001
bm<int8_t, op::lexi>/8/3                   2.35 ns         1.20 ns   1000000000
bm<int8_t, op::lexi>/24/22                 12.2 ns         5.40 ns    248888889
bm<int8_t, op::lexi>/105/-1                61.6 ns         26.7 ns     52705882
bm<int8_t, op::lexi>/4021/3056             1485 ns          755 ns      2357895
bm<uint16_t, op::lexi>/8/3                 2.27 ns         1.16 ns   1000000000
bm<uint16_t, op::lexi>/24/22               11.9 ns         6.05 ns    263529412
bm<uint16_t, op::lexi>/105/-1              60.7 ns         30.7 ns     44800000
bm<uint16_t, op::lexi>/4021/3056           1486 ns          680 ns      1723077
bm<uint32_t, op::lexi>/8/3                 2.18 ns         1.03 ns   1000000000
bm<uint32_t, op::lexi>/24/22               11.9 ns         5.85 ns    229743590
bm<uint32_t, op::lexi>/105/-1              59.1 ns         35.6 ns     37333333
bm<uint32_t, op::lexi>/4021/3056           1488 ns          643 ns      1629091
bm<uint64_t, op::lexi>/8/3                 2.17 ns         1.05 ns   1000000000
bm<uint64_t, op::lexi>/24/22               12.0 ns         4.90 ns    242162162
bm<uint64_t, op::lexi>/105/-1              59.1 ns         27.5 ns     47157895
bm<uint64_t, op::lexi>/4021/3056           1474 ns          750 ns      1792000

After:

bm<uint8_t, op::lexi>/8/3                  2.69 ns         1.98 ns    995555556
bm<uint8_t, op::lexi>/24/22                2.65 ns         2.09 ns    560000000
bm<uint8_t, op::lexi>/105/-1               4.30 ns         2.59 ns    640000000
bm<uint8_t, op::lexi>/4021/3056            59.4 ns         38.8 ns     42666667
bm<int8_t, op::lexi>/8/3                   2.67 ns         1.61 ns    814545455
bm<int8_t, op::lexi>/24/22                 2.66 ns         1.52 ns    689230769
bm<int8_t, op::lexi>/105/-1                4.25 ns         2.78 ns    426666667
bm<int8_t, op::lexi>/4021/3056             58.9 ns         39.1 ns     27151515
bm<uint16_t, op::lexi>/8/3                 3.09 ns         1.59 ns    689230769
bm<uint16_t, op::lexi>/24/22               3.76 ns         2.73 ns    527058824
bm<uint16_t, op::lexi>/105/-1              6.14 ns         3.57 ns    280000000
bm<uint16_t, op::lexi>/4021/3056           99.3 ns         67.8 ns     18666667
bm<uint32_t, op::lexi>/8/3                 2.73 ns         1.99 ns    746666667
bm<uint32_t, op::lexi>/24/22               3.76 ns         2.17 ns    597333333
bm<uint32_t, op::lexi>/105/-1              8.80 ns         6.49 ns    373333333
bm<uint32_t, op::lexi>/4021/3056            193 ns          143 ns     12800000
bm<uint64_t, op::lexi>/8/3                 2.68 ns         1.66 ns    896000000
bm<uint64_t, op::lexi>/24/22               5.10 ns         3.85 ns    426666667
bm<uint64_t, op::lexi>/105/-1              15.1 ns         9.51 ns    154482759
bm<uint64_t, op::lexi>/4021/3056            379 ns          291 ns      5270588

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner April 1, 2024 19:59
@StephanTLavavej StephanTLavavej self-assigned this Apr 1, 2024
@StephanTLavavej StephanTLavavej added the performance Must go faster label Apr 1, 2024
@AlexGuteniev

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej changed the title Vectorize lexicographical_compare! Vectorize lexicographical_compare! Apr 3, 2024
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Apr 7, 2024
@StephanTLavavej StephanTLavavej self-assigned this Apr 8, 2024
stl/inc/xutility Outdated Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

Thanks - this is so much easier to follow! 😻 🧠

@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 9839187 into microsoft:main Apr 12, 2024
35 checks passed
@StephanTLavavej
Copy link
Member

Thanks for vectorizing one of the STL's most important algorithms! 😻 🚀 🥇

@AlexGuteniev AlexGuteniev deleted the not_another_vector_algorithms_cpp_change branch April 12, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

vectorize lexicographical_compare, lexicographical_compare_three_way
2 participants