Constant prop gives a different result in the presence of FMA #41450

YingboMa · 2021-07-02T14:42:56Z

julia> foo(x=1.0) = muladd(1 + eps(x), 1 - eps(x), -1)
foo (generic function with 2 methods)

julia> foo()
0.0

julia> foo(1.0)
-4.930380657631324e-32

YingboMa · 2021-07-02T14:45:09Z

fma works fine

julia> goo(x=1.0) = fma(1 + eps(x), 1 - eps(x), -1)
goo (generic function with 2 methods)

julia> goo()
-4.930380657631324e-32

julia> goo(1.0)
-4.930380657631324e-32

yuyichao · 2021-07-02T22:27:50Z

This is the documented behavior of muladd. The whole reason it exists is that the compiler can choose to use different operation based on the context and availabile hardware features.

oscardssmith · 2021-07-02T22:42:06Z

@yuyichao This isn't a bug per-se, so much as the compiler making a dumb choice. The compiler is free to choose whether to fold a muladd into an fma, and if the result is being constant-propegated, the better choice is to fold it. This is the better choice because if the muladd is already getting constant folded, the CPU architecture doesn't matter, so we should return the more correct result.

yuyichao · 2021-07-02T22:55:40Z

No fma isn’t better or more correct at all and the compiler isn’t making any bad or dumb choice here. In terms of the results, both are equally good. This is only about performance and when things are constant folded both have the same performance. If some computation must be done at runtime, it should pick the one with higher performance, which isn’t necessarily fma even if such instruction exist (e.g. if a and b are both constants in a * b + c). On ARM it may not even fuse even if everything are variable due to the existence of unfused instruction.

oscardssmith · 2021-07-03T03:40:02Z

I don't think it's fair to say that both results are equally good. The result of fma(a,b,c) is the closest floating point number to big(a)*big(b)+big(c). When someone uses muladd they would prefer it to be fused unless there is a very good reason not to fuse since fusing the operations produces a more accurate result. This is why it is strictly better to return the fused value if constant folding occurs.

Also, since fma on most hardware that supports fma, (Intel since Broadwell,AMD since Piledriver), fma instructions have almost the same (or exactly the same) latency and throughput as just a regular add instruction https://www.agner.org/optimize/instruction_tables.pdf. As such, for most processors, it would be good to fuse even if a and b are constant.

yuyichao · 2021-07-03T07:59:22Z

The result of fma(a,b,c) is the closest floating point number to big(a)*big(b)+big(c)

Which is not the definition of "good". Machine precision floating point math has well defined rounding rules, (edit:) and I've certainly seen algorithms that produces less accurate results when fused because of that. I.e., while mathematically speaking for this single operation, FMA is certainly more "correct", it is not always the case in context and the context is what muladd is all about anyway.

When someone uses muladd they would prefer it to be fused unless there is a very good reason not to fuse since fusing the operations produces a more accurate result.

Which is not the case. Again, muladd is only about better performance, which may or may not use fma.

As such, for most processors, it would be good to fuse even if a and b are constant.

No. that requires loading two floating point constants instead of one.

KristofferC · 2021-07-03T08:26:36Z

The situation here seems to be:

Some of the code for Julia special function need fused muladd for maximum precision.
Improvements to the Julia optimizer means that more of these special functions are constant propagated.
The constant propagation does not fuse muladd.
This gives the unfortunate drawback of lower precision when special functions are propagated.

It would therefore be nice to have the option to ensure that (for a block of code) muladd is fused even when constant propagated, while at runtime it is acceptable for it not to be fused and take the small precision hit.

Keno · 2021-07-03T22:01:00Z

I think the correct solution here is to finally go and implement the target capability query macros and if those indicate FMA support, force an fma here.

Keno · 2021-07-03T22:07:15Z

In the meantime, I also don't think there's anything wrong with forcing FMA for the muladd constprop.

So that we can get more accurate results when performance doesn't matter

aviatesk · 2021-07-13T07:33:32Z

Trying to solve by #41564

So that we can get more accurate results when performance doesn't matter

simonbyrne · 2021-11-15T17:23:04Z

I think the correct solution here is to finally go and implement the target capability query macros and if those indicate FMA support, force an fma here.

For reference, this is issue #9855.

simonbyrne · 2021-11-15T17:23:25Z

Oh, and #33011

DilumAluthge mentioned this issue Jul 5, 2021

MPFR tests: skip the log2(x) == log2(42) test #41473

Merged

aviatesk added a commit that referenced this issue Jul 13, 2021

inference: fix #41450, replace constant-folded muladd with fma

6e367fa

So that we can get more accurate results when performance doesn't matter

aviatesk mentioned this issue Jul 13, 2021

wip: inference: fix #41450, replace constant-folded muladd with fma #41564

Closed

aviatesk added a commit that referenced this issue Jul 13, 2021

inference: fix #41450, replace constant-folded muladd with fma

4f3b1ae

So that we can get more accurate results when performance doesn't matter

aviatesk added a commit that referenced this issue Jul 13, 2021

inference: fix #41450, replace constant-folded muladd with fma

6d0b5b3

So that we can get more accurate results when performance doesn't matter

aviatesk added a commit that referenced this issue Jul 13, 2021

inference: fix #41450, replace constant-folded muladd with fma

65e3e7b

So that we can get more accurate results when performance doesn't matter

aviatesk mentioned this issue Nov 8, 2021

don't inline float64^float64 #42966

Merged

KristofferC mentioned this issue Nov 15, 2021

FMA on Windows is weirdly broken #43088

Closed

vtjnash closed this as completed Nov 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constant prop gives a different result in the presence of FMA #41450

Constant prop gives a different result in the presence of FMA #41450

YingboMa commented Jul 2, 2021

YingboMa commented Jul 2, 2021

yuyichao commented Jul 2, 2021

oscardssmith commented Jul 2, 2021

yuyichao commented Jul 2, 2021 •

edited

Loading

oscardssmith commented Jul 3, 2021

yuyichao commented Jul 3, 2021 •

edited

Loading

KristofferC commented Jul 3, 2021

Keno commented Jul 3, 2021

Keno commented Jul 3, 2021

aviatesk commented Jul 13, 2021 •

edited

Loading

simonbyrne commented Nov 15, 2021

simonbyrne commented Nov 15, 2021

Constant prop gives a different result in the presence of FMA #41450

Constant prop gives a different result in the presence of FMA #41450

Comments

YingboMa commented Jul 2, 2021

YingboMa commented Jul 2, 2021

yuyichao commented Jul 2, 2021

oscardssmith commented Jul 2, 2021

yuyichao commented Jul 2, 2021 • edited Loading

oscardssmith commented Jul 3, 2021

yuyichao commented Jul 3, 2021 • edited Loading

KristofferC commented Jul 3, 2021

Keno commented Jul 3, 2021

Keno commented Jul 3, 2021

aviatesk commented Jul 13, 2021 • edited Loading

simonbyrne commented Nov 15, 2021

simonbyrne commented Nov 15, 2021

yuyichao commented Jul 2, 2021 •

edited

Loading

yuyichao commented Jul 3, 2021 •

edited

Loading

aviatesk commented Jul 13, 2021 •

edited

Loading