-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement muladd
#9840
Implement muladd
#9840
Conversation
Fantastic, thanks. |
@@ -986,6 +986,20 @@ static Value *emit_intrinsic(intrinsic f, jl_value_t **args, size_t nargs, | |||
ArrayRef<Type*>(x->getType())), | |||
FP(x), FP(y), FP(z)); | |||
} | |||
HANDLE(muladd_float,3) | |||
#ifdef LLVM34 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API changed for only v3.4? Will this work on v3.3 and versions >= v3.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM34
doesn't do what you probably think it does -- it means "LLVM 3.4 or later".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, sorry for the noise.
Thanks again, it's good to have this. |
These other variants are not exposed by LLVM as intrinsics. I assume that LLVM's optimizer will generate these automatically when it encounters nearby |
Then why have |
On the other hand, A more fine-grained set of options would be possible. For example, LLVM distinguishes between "assume there are no nans", "assume there are no infs", "don't care about the sign of zero", and "allow reciprocals instead of division". Another flag could be "flush subnormal numbers to zero". But such a large set of options would be quite confusing. |
Last time I checked, LLVM was able to transform things like
into the relevant fused multiply-subtract instruction. |
Okay, on a Haswell machine running LLVM 3.5:
and
So it does look like it can handle the transformations correctly. |
@simonbyrne, good to know that |
There is one downside however: on a non-fma machine (using LLVM 3.3)
The code takes an explicit negation, rather than use a subtraction (the other case is fine). |
Yikes, that's unfortunate. |
Ah, it does appear to be fixed with LLVM 3.5, so I guess that's not too big of a problem. |
Is there some way to put in a regression test for this in case LLVM changes its mind again in the future? |
That's difficult from Julia. We don't really care about what instructions LLVM generates, we care that it executes as fast as possible. Instruction selection (and register selection and instruction scheduling) isn't typically something that we worry about in Julia. I think this should be an LLVM test case instead, checking that the LLVM I checked, and there doesn't seem to be such a check at the moment. This test
|
Currently Julia's own IR is represented as data but as soon as we get to LLVM IR we just print a bunch of text. A great project would be to expose LLVM IR as data in Julia – that would make testing for this kind of thing possible and possibly even straightforward. |
I've thought the same thing - #8275 (comment) Having Julia data structures for LLVM IR (and assembly too?) could be pretty useful. If there's not an existing up-for-grabs issue, should we open one? |
It might be better to stop at LLVM IR. As long as we only expose LLVM in Julia, it's hard to write inherently non-portable code, which is kind of nice. I guess if we wanted to reify machine code, each platform could have a different platform-specific representation. Please do open an issue! |
Oooh, LLVM passes written in Julia! |
(This branch is based on the
fma
branch which is likely to be merged soon #8112 . Since the implementations offma
andmuladd
are similar, there would otherwise be many conflicts.)Implement
muladd(x,y,z)
, a fast way to calculatex*y+z
.This is very different from
fma(x,y,z)
, which also calculatesx*y+z
.fma
is about accuracy; it guarantees that the intermediate resultx*y
is not rounded. This may be very slow on some platforms.muladd
is guaranteed to be fast, and will use architecture-specific instructions if available. Iffma
happens to be the fastest way for this operation, themuladd
will be equivalent tofma
.