`llvm.fma.bf16` intrinsic is expanded incorrectly #131531

beetrees · 2025-03-16T18:25:19Z

Consider the following LLVM IR:

define bfloat @do_fma(bfloat %a, bfloat %b, bfloat %c) {
    %res = call bfloat @llvm.fma.bf16(bfloat %a, bfloat %b, bfloat %c)
    ret bfloat %res
}

LLVM turns this into the equivalent of:

define bfloat @do_fma(bfloat %a, bfloat %b, bfloat %c) {
    %a_f32 = fpext bfloat %a to float
    %b_f32 = fpext bfloat %b to float
    %c_f32 = fpext bfloat %c to float
    %res_f32 = call float @llvm.fma.f32(float %a_f32, float %b_f32, float %c_f32)
    %res = fptrunc float %res_f32 to bfloat
    ret bfloat %res
}

This is a miscompilation, however, as float does not have enough precision to do a fused-multiply-add for bfloat without double rounding becoming an issue. For instance: do_fma(0x1.40p+127, 0x1.04p+0, 0x1.00p-133) = 0x1.46p+127, but LLVM's lowering to float FMA gives an incorrect result of 0x1.44p+127.

Just using double instead of float would also not be a correct lowering: it would give the same incorrect result as the example above (using the reasoning from #128450 (comment), a 126 + 127 + 8 = 261-bit significand would be required for double rounding not to be a problem with this lowering). I suspect the best option here is to lower to a libcall instead.

Closely related to #98389/#128450

The text was updated successfully, but these errors were encountered:

llvmbot added the new issue label Mar 16, 2025

hstk30-hw added llvm:ir and removed new issue labels Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`llvm.fma.bf16` intrinsic is expanded incorrectly #131531

`llvm.fma.bf16` intrinsic is expanded incorrectly #131531

beetrees commented Mar 16, 2025

llvm.fma.bf16 intrinsic is expanded incorrectly #131531

llvm.fma.bf16 intrinsic is expanded incorrectly #131531

Comments

beetrees commented Mar 16, 2025

`llvm.fma.bf16` intrinsic is expanded incorrectly #131531

`llvm.fma.bf16` intrinsic is expanded incorrectly #131531