<bit>: Is the __isa_available check for lzcnt worth the cost?

`std::bit_ceil` emits quite a bit of assembly on x64. This seems to occur mostly due [the branch in `_Checked_x86_x64_countl_zero`](https://github.com/microsoft/STL/blob/c34f24920e463a71791c2ee3d2bed14926518965/stl/inc/bit#L211-L212) and to a lesser extent due to [the branch in `bit_ceil`](https://github.com/microsoft/STL/blob/c34f24920e463a71791c2ee3d2bed14926518965/stl/inc/bit#L49) itself.

I've written a variant which produces a more compact result: https://godbolt.org/z/q4EEz83aW
(It also removes the extra branch on ARM64 by using conditional assignments.)

I've checked the PR that introduced the code (https://github.com/microsoft/STL/pull/795) and it appears as if the cost of this if condition wasn't discussed. The if condition generally makes sense though: `bsr` is costly on AMD CPUs (up to Zen3, 4 cycles/op latency), whereas `lzcnt` is very fast on any architecture (<= 1 cycle).
But it takes up 3 slots in the CPU's branch target buffer (contemporary hardware has ~4096 slots, newly added branches incur an extra 5-20 cycle latency), generates larger binaries after inlining and unfortunately the added instructions seem to add about ~5 cycles of latency themselves, offsetting the cost of `bsr`.

This makes me wonder: Should we drop the `__isa_available` check?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

<bit>: Is the __isa_available check for lzcnt worth the cost? #2849

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

<bit>: Is the __isa_available check for lzcnt worth the cost? #2849

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions