@statementreply asks:
I’m concerned about the correctness of our x86/x64 implementation of countl_zero, on CPUs without LZCNT support.
// We use lzcnt (actually bsr if lzcnt is not supported)
bsr(x) == integer_width - 1 - lzcnt(x) when x != 0, so the fallback won’t work.
I currently don’t have access to a computer with pre-AVX2 CPU. Could someone help testing this?