Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LLVM intrinsics for floor, ceil, trunc, and round #5983

Closed
simonster opened this issue Feb 28, 2014 · 12 comments
Closed

Use LLVM intrinsics for floor, ceil, trunc, and round #5983

simonster opened this issue Feb 28, 2014 · 12 comments
Labels
performance Must go faster

Comments

@simonster
Copy link
Member

Right now we call openlibm for these, but that is really slow (see my benchmarks versus VML). I think LLVM can optimize the intrinsics into a single instruction. Might be easiest to wait until we have #5046.

@lindahua
Copy link
Contributor

I think sqrt also can be done with single instruction in modern CPUs.

@simonster
Copy link
Member Author

The openlibm sqrt function just uses the appropriate instruction. I tested the LLVM intrinsic a while ago and it was not noticeably faster; the overhead of the function call is negligible compared to the instruction's reciprocal throughput. But it is possible that there is something to be gained here by using the sqrt intrinsic in loops that do more than just sqrt, or in conjunction with partial unrolling, which the loop vectorizer should do for us.

@lindahua
Copy link
Contributor

LLVM can vectorize a small number of commonly used math functions (see http://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls). It is useful to emit intrinsics for these functions (instead of ccall), which would make @simd even more useful.

@ArchRobison
Copy link
Contributor

Alas sqrt has a branch, so it won't vectorize with the current @simd. Making floor, ceil, and trunc into intrinsics enables vectorization of those by LLVM 3.5 on a Haswell. Alas round refuses to vectorize for the same setup. I send a PR after I'm happy with it.

@simonbyrne
Copy link
Contributor

@ArchRobison My guess (which could be completely wrong) for why round doesn't vectorise is that the definition doesn't completely match the compiler instruction: looking at the Intel instruction set reference, the ROUNDxx instructions round-to-nearest breaks ties by rounding to even (table 4-15 here). The C functions round/roundf break ties by rounding away from 0 (p. 233 here), which presumably requires a branch at some point. Presumably the VML folks have some clever trick to avoid the branch.

I would be in favour of a new function (rounde?) that matches the instruction, or failing that, exporting rint or nearbyint functions which depend on the current rounding mode (which is usually round-to-even).

@ViralBShah
Copy link
Member

What if we have a SIMD module that can have versions that Vectorise? A kludge but comes with the buyer beware notice.

@ArchRobison
Copy link
Contributor

I'd like to see this PR pulled and then open a new issue on the other functions. They involve significantly heavier hacking.

Here is the list of function calls that LLVM can vectorize. rint and round are conspicuously absent since there are LLVM intrinsics for them, but apparently the LLVM vectorizer can't handle them yet.

One way to correctly vectorize round would be for Julia to generate LLVM code that LLVM does know how to vectorize. round can be defined with straight-line code that involves trunc, ifelse, etc. and careful attention to some boundary cases. The sequence is a bit long, so it's probably not worth doing for scalar code.

Both rint and nearbyint introduce another complexity: they depend on the current global rounding mode. Does Julia yet have a means to control that mode? Do we even want that? Global state like that has always seemed a deeply flawed idea to me. Lexical scoping seems saner.

I would also be in favor of adding rounde, particulary for a language that promotes numerical accuracy. I was shocked to find out that the libm round rounds away from zeros in case of a tie, since doing so can bias results. And rint and nearby can be munged by global rounding state. Does anyone know if any other LLVM target languages or hardware have a 'rounde'-like function. We'd need examples to get it into other LLVM. Otherwise we'll have to carry around a custom LLVM patch for Julia.

@ArchRobison
Copy link
Contributor

I updated my previous remarks after I saw that Julia has support for the IEEE rounding state. Is there yet a Julia interface to rint and nearbyint?

@simonbyrne
Copy link
Contributor

Does anyone know if any other LLVM target languages or hardware have a 'rounde'-like function.

I don't know if it is an LLVM target, but apparently Haskell round rounds ties to even (I figured if any language was going to do it, Haskell would be the most likely). Also it is a necessary requirement (a "shall provide") for IEEE754-2008 compatibility (roundToIntegralTiesToEven, sections 5.3.1 & 5.9 of the standard).

Is there yet a Julia interface to rint and nearbyint?

No, but again rint behaviour (raise flag on inexact) is an IEEE754 requirement (roundToIntegralExact).

@simonbyrne
Copy link
Contributor

C# .NET also rounds ties to even:
http://msdn.microsoft.com/en-us/library/wyk4d9cy(v=vs.110).aspx

@pao
Copy link
Member

pao commented Sep 19, 2014

I don't know if it is an LLVM target, but apparently Haskell round rounds ties to even

Haskell has an LLVM code generator backend as an option.

This was referenced Oct 18, 2014
@simonbyrne
Copy link
Contributor

Closed by #8364.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants