Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only #4734

Merged
merged 38 commits into from
Aug 25, 2024

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Jun 18, 2024

Initially I thought that it could be fixed by using careful minmax implementation, that selects correctly either the first or the last value when the comparands are equivalent.

I've learned the behavior of [v]{min|max}{s|p}{s|d} instructions (thanks @statementreply and @Alcaro for enlightening me on that), figured out that it was possible to control which of the equivalent values is the result, also I've reported the compiler bug DevCom-10686775, and found a reliable workaround for it.

Unfortunately, the control over a single minmax instruction result is not enough. The whole value-based vectorization appoach does not work well with order requirements for equivalent elements Efficient vectorization requires vertical comparisons (same elements on different vector values) to be performed first, and horiziontal comparisons (different elements on the same vector value) to be performed last.

With index-based approach, as in minmax_element, changed order is fine, as we're looking for smallest/greatest index among equal elements.

As a result, we have to resort to using minmax_element approach for floating minmax, unless /fp:fast is specified. Should be not a big loss though -- the benchmark results in #4659 shows that smaller types benefit from minmax approach a lot, but floats not a lot. Definitely still way faster than scalar.

/fp:fast is still fine, as the compiler takes advantage of not distinguishing +0.0 and -0,0 and is able to emit vectorized minmax itself (see related issue #4453)

I decided to keep comparisons reordering for floats in -- this seems to improve the handing of NAN values, which is decided to be unsupported, but why won't keep something that accidentally does things better.


⏱️ Benchmark results

C:\Project\STL>out\bench\benchmark-minmax_element.exe --benchmark_min_time=1s --benchmark_filter=(float^|double)
2024-06-22T13:46:09+03:00
Running out\bench\benchmark-minmax_element.exe
Run on (12 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1280 KiB (x6)
  L3 Unified 12288 KiB (x1)
Benchmark main fix fix + /fp:fast
bm<float, 8021, Op::Min> 1184 ns 1207 ns 1221 ns
bm<float, 8021, Op::Max> 1210 ns 1212 ns 1208 ns
bm<float, 8021, Op::Both> 1357 ns 1379 ns 1362 ns
bm<float, 8021, Op::Min_val> 891 ns 1211 ns 891 ns
bm<float, 8021, Op::Max_val> 915 ns 1222 ns 883 ns
bm<float, 8021, Op::Both_val> 955 ns 1378 ns 940 ns
bm<double, 8021, Op::Min> 2246 ns 2393 ns 2352 ns
bm<double, 8021, Op::Max> 2365 ns 2393 ns 2361 ns
bm<double, 8021, Op::Both> 2719 ns 2753 ns 2727 ns
bm<double, 8021, Op::Min_val> 1880 ns 2365 ns 1849 ns
bm<double, 8021, Op::Max_val> 1877 ns 2358 ns 1868 ns
bm<double, 8021, Op::Both_val> 1933 ns 2688 ns 1913 ns

The reodreding of _mm[256]_{min|max}_p{s|d} args seems a bit unfavorable for performance, but not very much, at least the results difference is within variation.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner June 18, 2024 20:08
@AlexGuteniev AlexGuteniev changed the title Sedicated test coverage for floating minmax of +0.0 and -0.0 only Dedicated test coverage for floating minmax of +0.0 and -0.0 only Jun 18, 2024
@StephanTLavavej StephanTLavavej added the test Related to test code label Jun 18, 2024
@StephanTLavavej StephanTLavavej self-assigned this Jun 18, 2024
@AlexGuteniev

This comment was marked as resolved.

@AlexGuteniev AlexGuteniev marked this pull request as draft June 19, 2024 07:02
@StephanTLavavej StephanTLavavej removed their assignment Jun 19, 2024
@StephanTLavavej StephanTLavavej added the blocked Something is preventing work on this label Jun 19, 2024
@AlexGuteniev

This comment was marked as resolved.

@AlexGuteniev AlexGuteniev marked this pull request as ready for review June 20, 2024 16:15
@AlexGuteniev AlexGuteniev changed the title Dedicated test coverage for floating minmax of +0.0 and -0.0 only Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only Jun 20, 2024
@StephanTLavavej StephanTLavavej removed the blocked Something is preventing work on this label Jun 20, 2024
@StephanTLavavej StephanTLavavej self-assigned this Jun 20, 2024
@StephanTLavavej StephanTLavavej added bug Something isn't working and removed test Related to test code labels Jun 20, 2024
@StephanTLavavej

This comment was marked as resolved.

@AlexGuteniev AlexGuteniev marked this pull request as ready for review June 21, 2024 14:10
@StephanTLavavej

This comment was marked as resolved.

tests/std/include/test_vector_algorithms_support.hpp Outdated Show resolved Hide resolved
tests/std/include/test_vector_algorithms_support.hpp Outdated Show resolved Hide resolved
tests/std/include/test_vector_algorithms_support.hpp Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/src/vector_algorithms.cpp Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Aug 20, 2024
@StephanTLavavej StephanTLavavej self-assigned this Aug 25, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej
Copy link
Member

I had to push an additional commit to drop my eternal nemeses:

C:\Temp>cl /EHsc /nologo /W4 /fp:strict meow.cpp
meow.cpp

C:\Temp>cl /clr /nologo /W4 /fp:strict meow.cpp
cl : Command line error D8016 : '/clr' and '/fp:strict' command-line options are incompatible

C:\Temp>cl /clr:pure /nologo /W4 /fp:strict meow.cpp
cl : Command line warning D9035 : option 'clr:pure' has been deprecated and will be removed in a future release
cl : Command line error D8016 : '/clr:pure' and '/fp:strict' command-line options are incompatible

@AlexGuteniev
Copy link
Contributor Author

With #162 it could have been noticed in advance

@StephanTLavavej StephanTLavavej merged commit 3705e36 into microsoft:main Aug 25, 2024
39 checks passed
@StephanTLavavej
Copy link
Member

Thanks for setting the maximum number of bugs in this area to negative zero! ➖ 0️⃣ 😹

@AlexGuteniev AlexGuteniev deleted the float_zeros branch August 25, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants