-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Originally reported as DevCom-1578571 by Jörn Heusipp.
VS2022 cl 19.30.30705 generates non-universally-available tzcnt instruction for expression
std::countr_one(static_cast<unsigned char>(0b11111110)) != 0minimal test case:
#include <bit> bool fail() { return std::countr_one(static_cast<unsigned char>(0b11111110)) != 0; }The problem can be seen in the asm output of the attached file
fail.cppinfail.codwhen built withcl.exe /c /O2 /W4 /std:c++20 /FAcu /arch:IA32 fail.cpp(as done by runningcompile_and_run.cmd). (see test.zip )MSVC generates
bool fail()as (comments removed)?fail@@YA_NXZ PROC ; fail, COMDAT 00000 51 push ecx 00001 b8 01 ff ff ff mov eax, -255 ; ffffff01H 00006 f3 0f bc c0 tzcnt eax, eax 0000a 0f 95 c0 setne al 0000d 59 pop ecx 0000e c3 ret 0 ?fail@@YA_NXZ ENDP ; failwhich uses the
tzcntinstruction.The
tzcntinstruction is part of the “BMI1” x86 instruction set extensions (see https://www.felixcloutier.com/x86/tzcnt) which cannot be assumed to be universally available, especially not when compiling x86 32bit code with/arch:IA32.In case of CPUs that do not implement
tzcntit is documented as falling back tobsf(https://www.felixcloutier.com/x86/bsf) by the CPU. Especially notice the different semantics regarding setting the zero flag (ZF).On my Intel Sandy Bridge (i7-3930K) where “BMI1” is not present in the cpuid feature flags, the generated code causes
tzcnt(interpreted asbsf) to not setZF, which causessetneto setalto 1, which is the wrong result.The same problem is also reproducible on an Intel Xeon E5-2697 v2 (Ivy Bridge) and an AMD Phenom X4 9650 (K10), which also do not support
BMI1. Wikipedia agrees (https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#Supporting_CPUs).
compile_and_run.cmd(in a VS2022 x86 prompt) should outputfailon systems withoutBMI1andokon systems withBMI1.Note that MSVC also generates
tzcnton amd64 where it is also not generally available, certainly not when targetting only SSE2, so this bug does affects both, x86 and amd64 code generation.AVX extensions also do not imply BMI1 extensions availability. I have not researched whether AVX2 implies BMI1 on all known hardware.
The bug was originally triggered by the libopenmpt (https://lib.openmpt.org/) unit tests (https://github.com/OpenMPT/openmpt/blob/master/build/vs2022win10/libopenmpt_test.sln “Release”) when built with VS2022 in C++20 mode (see https://github.com/OpenMPT/openmpt/blob/master/src/mpt/base/bit.hpp#L185) targetting x86 or amd64. See https://github.com/OpenMPT/openmpt/blob/master/src/mpt/base/tests/tests_base_bit.hpp#L163, where it triggers failures in multiple test cases on a Sandy Bridge CPU.
Additional reference: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf.
https://godbolt.org/z/aPjnojjcx. Note how the compiler makes use of ZF in compare_0. tzcnt sets ZF when the result is 0, but bsf sets ZF when the input is 0.