-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement fma #8112
Implement fma #8112
Conversation
Type *tz = z->getType(); | ||
Type *ts[3] = { tx, ty, tz }; | ||
return builder.CreateCall3 | ||
(jl_Module->getOrInsertFunction(tx==T_float64 ? "fma" : "fmaf", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to use the llvm intrinsic here.
I'd prefer to call |
I have no preference as to the name; muladd seems better. |
Fantastic! I made some general comments on #6330, but I think even if we do go down that direction, we will probably still want to expose these functions.
|
Thanks for doing this. I can cross it off my to-do list :-) |
I don't see Yes, |
@@ -6,6 +6,7 @@ namespace JL_I { | |||
neg_int, add_int, sub_int, mul_int, | |||
sdiv_int, udiv_int, srem_int, urem_int, smod_int, | |||
neg_float, add_float, sub_float, mul_float, div_float, rem_float, | |||
fma_float, muladd_float, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
muladd_float
seems to be defined as an intrinisic, but the intrinsic appears to be unused by the rest of the patch.
I see now that LLVM 3.4.2 has both the mandatory fusion I marked the first occurrence of |
I don't much like the idea of opening the door to simple sequential arithmetic code giving different answers on different machines based on their performance characteristics. I'm not convinced that the performance advantage of fma over separate mul and add is ever that great. That leaves accuracy as a motivation for fma – and indeed, there are cases where you really need this. But in that case, it is incorrect to do the mul and add separately. So I guess what I'm saying is that I think we should only expose the mandatory fma operation. |
If only the mandatory
|
@StefanKarpinski I can't comment on the performance advantage of |
It's true that we can't always avoid this, but explicitly introducing things that are unpredictable feels odd. |
The case for Alternatively, we can introduce |
Would it also be possible to have a way of determining whether or not hardware fma is available? C has the |
We could. However, it is more elegant to let the programmer specify what the algorithm needs ("accurate", "fast"), and then let Julia choose what is right for a platform. Whether an actual fma instruction will be used for a particular construct depends on the code generator, and may not be known until machine code is generated. For example, the optimizer may determine that the multiplication can be hoisted out of a loop, leaving only the addition inside the loop. If you check for such a feature, and then use an algorithm with an explicit |
@eschnett It depends on what level you're programming at. For things like double-double arithmetic, you need the accuracy, but you don't want to fallback to using software fma. |
6c7c7e3
to
1a4c02f
Compare
I've had a quick play around with this (thanks to @xianyi for access to the openblas Haswell machine). Two things I've noticed:
hi = x*y
lo = fma(x,y,-hi) The resulting |
@simonbyrne: Are you using LLVM 3.3 or 3.5? My impression is that LLVM 3.5 has better support for Haswell instructions, though I have not played with |
These look like LLVM issues to me. Can anyone confirm that the |
I just updated to 3.5, but that didn't help with either issue. |
I found the problem with
and |
FP(x), FP(y), FP(z)); | ||
} | ||
HANDLE(muladd_float,3) | ||
#ifdef LLVM33 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per note, this needs to be #idef LLVM34 and the then/else parts swapped.
Thanks for the pointers. |
I just tried this branch again: both issues I raised above now seem fixed. |
I would be in favour of keeping |
One difference between
there is no guarantee that I assume that there are cases where one may want a different behaviour. In my mind, Since the cases of |
The Travis failure seems unrelated. |
abad2b6
to
cf27a93
Compare
Are there any objections to merging this? |
This looks like it should be ready to merge. |
I'll do the final rebase and handle the conflicts. |
Rebased. Note that this only implements |
`fma(x,y,z)` calculates `x*y+z` without rounding the intermediate result `x*y`.
@eschnett can you squash this again? It's best to not have test failures in the commit history to help out |
What is the best way to export |
@simonbyrne We can always add a new intrinsic for this. |
Squashed. |
Thanks again @eschnett! |
👍 |
I've implemented fma(x,y,z) and mad(x,y,z), as a follow-up to #6330. I'd be happy to receive comments.