Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize minmax on each code path #4913

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

AlexGuteniev
Copy link
Contributor

Resolves #4660

(Turns out that #4660 is completely orthogonal to #4453. Initially I thought that the "prefer values" branch is inherently vectorization friendly or can be made such, but it is not true. as that branch reduces comparisons number, whereas vectoriztion-friendly loop should be naive)

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner August 26, 2024 09:34
@AlexGuteniev
Copy link
Contributor Author

Benchmark main Time
bm<uint8_t, 8021, Op::Both_val> 4126 ns 100 ns
bm<uint16_t, 8021, Op::Both_val> 4200 ns 229 ns
bm<uint32_t, 8021, Op::Both_val> 361 ns 362 ns
bm<uint64_t, 8021, Op::Both_val> 2938 ns 2889 ns
bm<int8_t, 8021, Op::Both_val> 3715 ns 102 ns
bm<int16_t, 8021, Op::Both_val> 3156 ns 235 ns
bm<int32_t, 8021, Op::Both_val> 363 ns 364 ns
bm<int64_t, 8021, Op::Both_val> 3120 ns 3123 ns
bm<float, 8021, Op::Both_val> 1329 ns 1318 ns
bm<double, 8021, Op::Both_val> 2677 ns 2560 ns

@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 592b639 into microsoft:main Sep 9, 2024
39 checks passed
@StephanTLavavej
Copy link
Member

Thanks for closely monitoring performance and making sure all cases are covered! 🚀 😻 💯

@AlexGuteniev AlexGuteniev deleted the small_things branch September 9, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

minmax 8 and 16 bit elements are not vectorized
2 participants