-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix _mm_test_mix_ones_zeros and _mm_testnzc_si128 #621
Conversation
I'm not sure, what all, is wrong here. One error is the use of a logical NOT (`!a[0]`) was instead of a bitwise NOT (`~_a[0]`).
Bug: `_mm_test_mix_ones_zeros` always returned true. The function wasn't reducing `zf` and `cf` to a bool before combining them. The fix proposed here isn't the most efficient, but at least it is correct. Note(s): The arguments are named incorrectly in the `_mm_test_mix_ones_zeros` documentation[0]. The second argument is the mask, as per the behavior of `_mm_test_mix_ones_zeros` with gcc and clang. This naming error seems to have propagated through both gcc[1] and llvm[2] headers but not to rust[3] headers or sse2neon[4]. [0] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=ptest&techs=SSE_ALL&ig_expand=6902 [1] https://github.com/gcc-mirror/gcc/blob/27ce74fa23c93c1189c301993cd19ea766e6bdb5/gcc/config/i386/smmintrin.h#L94 [2] https://github.com/llvm/llvm-project/blob/70535f5e609f747c28cfef699eefb84581b0aac0/clang/lib/Headers/smmintrin.h#L1130 [3] https://github.com/rust-lang/stdarch/blob/f4528dd6e85d97bb802240d7cd048b6e1bf72540/crates/core_arch/src/x86/sse41.rs#L1149 [4] https://github.com/DLTcollab/sse2neon/blob/243e90f654193c97a691b1a53213d091e02eb631/sse2neon.h#L7595
This allows us to write the call other functions without a forward declare
tests/impl.cpp
Outdated
has_ones |= _a[i] & _mask[i]; | ||
has_zeros |= ~_a[i] & _mask[i]; | ||
} | ||
int32_t result = ((has_ones != 0) & (has_zeros != 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible maybe we can add some edge cases here to test the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where the values are coming from, I'll leave this to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible maybe we can add some edge cases here to test the behavior?
@howjmay, Can you show relevant code accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if there is a test case all the bits of mask
is zero would be good. It is a case that force the answer to be zero
Thank you for the contribution!😍🥳 |
@@ -7592,14 +7592,22 @@ FORCE_INLINE int _mm_test_all_zeros(__m128i a, __m128i mask) | |||
// zero, otherwise set CF to 0. Return 1 if both the ZF and CF values are zero, | |||
// otherwise return 0. | |||
// https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=mm_test_mix_ones_zero | |||
// Note: Argument names may be wrong in the Intel intrinsics guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question about this. Why do you think the argument may be wrong. Just curious about the reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd argument is the mask, by behavior.
Clang, GCC, Rust, etc. pass the 2nd argument as b
to _mm_testnzc_si128
.
Version 3.4.6
of the Intel® Intrinsics Guide names the second argument mask
, while the current version 3.6.6
names the first argument mask
.
Why is it now wrong in the docs?
I assume it has something to do with AXV512 always having the mask as the first argument.
Which argument is named mask
doesn't matter for _mm_test_all_zeros
but it does for _mm_test_mix_ones_zeros
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the Release notes:
on 01/24/2023
Corrected parameter order of _mm_test_all_zeros and _mm_test_mix_ones_zeros.
I don't know if this person still maintains this or not:
https://old.reddit.com/r/simd/comments/cck1kh/feedback_on_intel_intrinsics_guide/
I don't have a twitter account so I can't really track them down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following compilers also treat the 2nd arg as mask:
MSVC 19.37
ICC 2021.10.0
ICX 2023.2.1
so the docs definitely don't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it!! Thank you for explaining!!!
I'd prefer to remain uncredited. |
Thank @aqrit for contributing. It is time to make a new release. |
Bug:
_mm_test_mix_ones_zeros
always returned true.The function wasn't reducing
zf
andcf
to a bool before combining them.The fix proposed here isn't the most efficient, but at least it is correct.
_mm_testnzc_si128
is an alias for_mm_test_mix_ones_zeros
.Note(s):
The arguments are named incorrectly in the
_mm_test_mix_ones_zeros
documentation[0].The second argument is the mask, as per the behavior of
_mm_test_mix_ones_zeros
with gcc and clang.This naming error seems to have propagated through both gcc[1] and llvm[2] headers but not to rust[3] headers or sse2neon[4].
[0] https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=ptest&techs=SSE_ALL&ig_expand=6902
[1] https://github.com/gcc-mirror/gcc/blob/27ce74fa23c93c1189c301993cd19ea766e6bdb5/gcc/config/i386/smmintrin.h#L94
[2] https://github.com/llvm/llvm-project/blob/70535f5e609f747c28cfef699eefb84581b0aac0/clang/lib/Headers/smmintrin.h#L1130
[3] https://github.com/rust-lang/stdarch/blob/f4528dd6e85d97bb802240d7cd048b6e1bf72540/crates/core_arch/src/x86/sse41.rs#L1149
[4]
sse2neon/sse2neon.h
Line 7595 in 243e90f