Skip to content

<bit>: countr_one(uint8) != 0 bad codegen on x86/x64 #2330

@statementreply

Description

@statementreply

Originally reported as DevCom-1578571 by Jörn Heusipp.

VS2022 cl 19.30.30705 generates non-universally-available tzcnt instruction for expression std::countr_one(static_cast<unsigned char>(0b11111110)) != 0

minimal test case:

#include <bit>
bool fail() {
	return std::countr_one(static_cast<unsigned char>(0b11111110)) != 0;
}

The problem can be seen in the asm output of the attached file fail.cpp in fail.cod when built with cl.exe /c /O2 /W4 /std:c++20 /FAcu /arch:IA32 fail.cpp (as done by running compile_and_run.cmd). (see test.zip )

MSVC generates bool fail() as (comments removed)

?fail@@YA_NXZ PROC					; fail, COMDAT
  00000	51		 push	 ecx
  00001	b8 01 ff ff ff	 mov	 eax, -255		; ffffff01H
  00006	f3 0f bc c0	 tzcnt	 eax, eax
  0000a	0f 95 c0	 setne	 al
  0000d	59		 pop	 ecx
  0000e	c3		 ret	 0
?fail@@YA_NXZ ENDP					; fail

which uses the tzcnt instruction.

The tzcnt instruction is part of the “BMI1” x86 instruction set extensions (see https://www.felixcloutier.com/x86/tzcnt) which cannot be assumed to be universally available, especially not when compiling x86 32bit code with /arch:IA32.

In case of CPUs that do not implement tzcnt it is documented as falling back to bsf (https://www.felixcloutier.com/x86/bsf) by the CPU. Especially notice the different semantics regarding setting the zero flag (ZF).

On my Intel Sandy Bridge (i7-3930K) where “BMI1” is not present in the cpuid feature flags, the generated code causes tzcnt (interpreted as bsf) to not set ZF, which causes setne to set al to 1, which is the wrong result.

The same problem is also reproducible on an Intel Xeon E5-2697 v2 (Ivy Bridge) and an AMD Phenom X4 9650 (K10), which also do not support BMI1. Wikipedia agrees (https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#Supporting_CPUs).

compile_and_run.cmd (in a VS2022 x86 prompt) should output fail on systems without BMI1 and ok on systems with BMI1.

Note that MSVC also generates tzcnt on amd64 where it is also not generally available, certainly not when targetting only SSE2, so this bug does affects both, x86 and amd64 code generation.

AVX extensions also do not imply BMI1 extensions availability. I have not researched whether AVX2 implies BMI1 on all known hardware.

The bug was originally triggered by the libopenmpt (https://lib.openmpt.org/) unit tests (https://github.com/OpenMPT/openmpt/blob/master/build/vs2022win10/libopenmpt_test.sln “Release”) when built with VS2022 in C++20 mode (see https://github.com/OpenMPT/openmpt/blob/master/src/mpt/base/bit.hpp#L185) targetting x86 or amd64. See https://github.com/OpenMPT/openmpt/blob/master/src/mpt/base/tests/tests_base_bit.hpp#L163, where it triggers failures in multiple test cases on a Sandy Bridge CPU.

Additional reference: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf.

https://godbolt.org/z/aPjnojjcx. Note how the compiler makes use of ZF in compare_0. tzcnt sets ZF when the result is 0, but bsf sets ZF when the input is 0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions