Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant prop gives a different result in the presence of FMA #41450

Closed
YingboMa opened this issue Jul 2, 2021 · 12 comments
Closed

Constant prop gives a different result in the presence of FMA #41450

YingboMa opened this issue Jul 2, 2021 · 12 comments

Comments

@YingboMa
Copy link
Contributor

YingboMa commented Jul 2, 2021

julia> foo(x=1.0) = muladd(1 + eps(x), 1 - eps(x), -1)
foo (generic function with 2 methods)

julia> foo()
0.0

julia> foo(1.0)
-4.930380657631324e-32
@YingboMa
Copy link
Contributor Author

YingboMa commented Jul 2, 2021

fma works fine

julia> goo(x=1.0) = fma(1 + eps(x), 1 - eps(x), -1)
goo (generic function with 2 methods)

julia> goo()
-4.930380657631324e-32

julia> goo(1.0)
-4.930380657631324e-32

@yuyichao
Copy link
Contributor

yuyichao commented Jul 2, 2021

This is the documented behavior of muladd. The whole reason it exists is that the compiler can choose to use different operation based on the context and availabile hardware features.

@oscardssmith
Copy link
Member

@yuyichao This isn't a bug per-se, so much as the compiler making a dumb choice. The compiler is free to choose whether to fold a muladd into an fma, and if the result is being constant-propegated, the better choice is to fold it. This is the better choice because if the muladd is already getting constant folded, the CPU architecture doesn't matter, so we should return the more correct result.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 2, 2021

No fma isn’t better or more correct at all and the compiler isn’t making any bad or dumb choice here. In terms of the results, both are equally good. This is only about performance and when things are constant folded both have the same performance. If some computation must be done at runtime, it should pick the one with higher performance, which isn’t necessarily fma even if such instruction exist (e.g. if a and b are both constants in a * b + c). On ARM it may not even fuse even if everything are variable due to the existence of unfused instruction.

@oscardssmith
Copy link
Member

I don't think it's fair to say that both results are equally good. The result of fma(a,b,c) is the closest floating point number to big(a)*big(b)+big(c). When someone uses muladd they would prefer it to be fused unless there is a very good reason not to fuse since fusing the operations produces a more accurate result. This is why it is strictly better to return the fused value if constant folding occurs.

Also, since fma on most hardware that supports fma, (Intel since Broadwell,AMD since Piledriver), fma instructions have almost the same (or exactly the same) latency and throughput as just a regular add instruction https://www.agner.org/optimize/instruction_tables.pdf. As such, for most processors, it would be good to fuse even if a and b are constant.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 3, 2021

The result of fma(a,b,c) is the closest floating point number to big(a)*big(b)+big(c)

Which is not the definition of "good". Machine precision floating point math has well defined rounding rules, (edit:) and I've certainly seen algorithms that produces less accurate results when fused because of that. I.e., while mathematically speaking for this single operation, FMA is certainly more "correct", it is not always the case in context and the context is what muladd is all about anyway.

When someone uses muladd they would prefer it to be fused unless there is a very good reason not to fuse since fusing the operations produces a more accurate result.

Which is not the case. Again, muladd is only about better performance, which may or may not use fma.

As such, for most processors, it would be good to fuse even if a and b are constant.

No. that requires loading two floating point constants instead of one.

@KristofferC
Copy link
Member

The situation here seems to be:

  • Some of the code for Julia special function need fused muladd for maximum precision.
  • Improvements to the Julia optimizer means that more of these special functions are constant propagated.
  • The constant propagation does not fuse muladd.
  • This gives the unfortunate drawback of lower precision when special functions are propagated.

It would therefore be nice to have the option to ensure that (for a block of code) muladd is fused even when constant propagated, while at runtime it is acceptable for it not to be fused and take the small precision hit.

@Keno
Copy link
Member

Keno commented Jul 3, 2021

I think the correct solution here is to finally go and implement the target capability query macros and if those indicate FMA support, force an fma here.

@Keno
Copy link
Member

Keno commented Jul 3, 2021

In the meantime, I also don't think there's anything wrong with forcing FMA for the muladd constprop.

aviatesk added a commit that referenced this issue Jul 13, 2021
So that we can get more accurate results when performance doesn't matter
@aviatesk
Copy link
Member

aviatesk commented Jul 13, 2021

Trying to solve by #41564

aviatesk added a commit that referenced this issue Jul 13, 2021
So that we can get more accurate results when performance doesn't matter
aviatesk added a commit that referenced this issue Jul 13, 2021
So that we can get more accurate results when performance doesn't matter
aviatesk added a commit that referenced this issue Jul 13, 2021
So that we can get more accurate results when performance doesn't matter
@simonbyrne
Copy link
Contributor

I think the correct solution here is to finally go and implement the target capability query macros and if those indicate FMA support, force an fma here.

For reference, this is issue #9855.

@simonbyrne
Copy link
Contributor

Oh, and #33011

@vtjnash vtjnash closed this as completed Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants