Use LLVM intrinsics for floor, ceil, trunc, and round #5983

simonster · 2014-02-28T17:25:42Z

Right now we call openlibm for these, but that is really slow (see my benchmarks versus VML). I think LLVM can optimize the intrinsics into a single instruction. Might be easiest to wait until we have #5046.

lindahua · 2014-02-28T17:32:51Z

I think sqrt also can be done with single instruction in modern CPUs.

simonster · 2014-02-28T17:42:57Z

The openlibm sqrt function just uses the appropriate instruction. I tested the LLVM intrinsic a while ago and it was not noticeably faster; the overhead of the function call is negligible compared to the instruction's reciprocal throughput. But it is possible that there is something to be gained here by using the sqrt intrinsic in loops that do more than just sqrt, or in conjunction with partial unrolling, which the loop vectorizer should do for us.

lindahua · 2014-03-13T21:14:34Z

LLVM can vectorize a small number of commonly used math functions (see http://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls). It is useful to emit intrinsics for these functions (instead of ccall), which would make @simd even more useful.

ArchRobison · 2014-09-12T23:06:01Z

Alas sqrt has a branch, so it won't vectorize with the current @simd. Making floor, ceil, and trunc into intrinsics enables vectorization of those by LLVM 3.5 on a Haswell. Alas round refuses to vectorize for the same setup. I send a PR after I'm happy with it.

simonbyrne · 2014-09-18T12:37:00Z

@ArchRobison My guess (which could be completely wrong) for why round doesn't vectorise is that the definition doesn't completely match the compiler instruction: looking at the Intel instruction set reference, the ROUNDxx instructions round-to-nearest breaks ties by rounding to even (table 4-15 here). The C functions round/roundf break ties by rounding away from 0 (p. 233 here), which presumably requires a branch at some point. Presumably the VML folks have some clever trick to avoid the branch.

I would be in favour of a new function (rounde?) that matches the instruction, or failing that, exporting rint or nearbyint functions which depend on the current rounding mode (which is usually round-to-even).

ViralBShah · 2014-09-18T12:53:45Z

What if we have a SIMD module that can have versions that Vectorise? A kludge but comes with the buyer beware notice.

ArchRobison · 2014-09-18T15:29:55Z

I'd like to see this PR pulled and then open a new issue on the other functions. They involve significantly heavier hacking.

Here is the list of function calls that LLVM can vectorize. rint and round are conspicuously absent since there are LLVM intrinsics for them, but apparently the LLVM vectorizer can't handle them yet.

One way to correctly vectorize round would be for Julia to generate LLVM code that LLVM does know how to vectorize. round can be defined with straight-line code that involves trunc, ifelse, etc. and careful attention to some boundary cases. The sequence is a bit long, so it's probably not worth doing for scalar code.

~~Both rint and nearbyint introduce another complexity: they depend on the current global rounding mode. Does Julia yet have a means to control that mode? Do we even want that?~~ Global state like that has always seemed a deeply flawed idea to me. Lexical scoping seems saner.

I would also be in favor of adding rounde, particulary for a language that promotes numerical accuracy. I was shocked to find out that the libm round rounds away from zeros in case of a tie, since doing so can bias results. And rint and nearby can be munged by global rounding state. Does anyone know if any other LLVM target languages or hardware have a 'rounde'-like function. We'd need examples to get it into other LLVM. Otherwise we'll have to carry around a custom LLVM patch for Julia.

ArchRobison · 2014-09-18T21:21:45Z

I updated my previous remarks after I saw that Julia has support for the IEEE rounding state. Is there yet a Julia interface to rint and nearbyint?

simonbyrne · 2014-09-19T10:30:01Z

Does anyone know if any other LLVM target languages or hardware have a 'rounde'-like function.

I don't know if it is an LLVM target, but apparently Haskell round rounds ties to even (I figured if any language was going to do it, Haskell would be the most likely). Also it is a necessary requirement (a "shall provide") for IEEE754-2008 compatibility (roundToIntegralTiesToEven, sections 5.3.1 & 5.9 of the standard).

Is there yet a Julia interface to rint and nearbyint?

No, but again rint behaviour (raise flag on inexact) is an IEEE754 requirement (roundToIntegralExact).

simonbyrne · 2014-09-19T10:35:56Z

C# .NET also rounds ties to even:
http://msdn.microsoft.com/en-us/library/wyk4d9cy(v=vs.110).aspx

pao · 2014-09-19T12:36:41Z

I don't know if it is an LLVM target, but apparently Haskell round rounds ties to even

Haskell has an LLVM code generator backend as an option.

simonbyrne · 2014-12-11T13:05:54Z

Closed by #8364.

simonster added the performance label Feb 28, 2014

JeffBezanson mentioned this issue Mar 13, 2014

performance regressions since 0.2 #6112

Closed

simonster mentioned this issue Sep 11, 2014

Global llvmcall statements #8308

Closed

simonbyrne mentioned this issue Sep 18, 2014

Add LLVM intrinsics for floor/ceil/trunc/abs. #8364

Merged

This was referenced Oct 18, 2014

merging itrunc into trunc, etc. #8728

Closed

round ties behaviour #8750

Closed

simonbyrne closed this as completed Dec 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use LLVM intrinsics for floor, ceil, trunc, and round #5983

Use LLVM intrinsics for floor, ceil, trunc, and round #5983

simonster commented Feb 28, 2014

lindahua commented Feb 28, 2014

simonster commented Feb 28, 2014

lindahua commented Mar 13, 2014

ArchRobison commented Sep 12, 2014

simonbyrne commented Sep 18, 2014

ViralBShah commented Sep 18, 2014

ArchRobison commented Sep 18, 2014

ArchRobison commented Sep 18, 2014

simonbyrne commented Sep 19, 2014

simonbyrne commented Sep 19, 2014

pao commented Sep 19, 2014

simonbyrne commented Dec 11, 2014

Use LLVM intrinsics for floor, ceil, trunc, and round #5983

Use LLVM intrinsics for floor, ceil, trunc, and round #5983

Comments

simonster commented Feb 28, 2014

lindahua commented Feb 28, 2014

simonster commented Feb 28, 2014

lindahua commented Mar 13, 2014

ArchRobison commented Sep 12, 2014

simonbyrne commented Sep 18, 2014

ViralBShah commented Sep 18, 2014

ArchRobison commented Sep 18, 2014

ArchRobison commented Sep 18, 2014

simonbyrne commented Sep 19, 2014

simonbyrne commented Sep 19, 2014

pao commented Sep 19, 2014

simonbyrne commented Dec 11, 2014