-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
float vs int: optimising away infinity comparisons #11987
Comments
For the record, the current function is (after macro expansion): function lt_new(x::Int, y::Float64)
fx = Float64(x)
(fx < y) | ((fx == y) & (( fx == Float64(typemax(Int)) ) | (x < unsafe_trunc(Int, fx) ) ))
end and the old one was effectively function lt_old(x::Int, y::Float64)
fx = Float64(x)
(Float64(typemax(Int)) <= y) | (fx < y) | ((fx == y) & (x < unsafe_trunc(Int,fx)))
end |
Is some solution along this line (use @inline function ltsif64(x::Int64, y::Float64)
Base.llvmcall("""
%fx = sitofp i64 %0 to double
%3 = fcmp ogt double %1, 9.223372036854776e18
%4 = fcmp olt double %fx, %1
%5 = or i1 %3, %4
%6 = fcmp oeq double %fx, %1
%7 = fptosi double %fx to i64
%8 = icmp slt i64 %0, %7
%9 = and i1 %6, %8
%10 = or i1 %5, %9
ret i1 %10
""", Bool, Tuple{Int64, Float64}, x, y)
end
println(ltsif64(1, 2.0))
foo1(x) = x < Inf
foo2(x) = ltsif64(x, Inf)
@code_llvm (<)(1, Inf)
@code_llvm ltsif64(1, Inf)
@code_llvm foo1(1)
@code_llvm foo2(1) Output true
define i1 @"julia_<_21642"(i64, double) {
top:
%2 = sitofp i64 %0 to double
%3 = fcmp olt double %2, %1
%4 = fcmp oeq double %2, %1
%5 = fcmp oeq double %2, 0x43E0000000000000
%6 = fptosi double %2 to i64
%7 = icmp sgt i64 %6, %0
%8 = or i1 %5, %7
%9 = and i1 %4, %8
%10 = or i1 %3, %9
ret i1 %10
}
define i1 @julia_ltsif64_21643(i64, double) {
top:
%fx.i = sitofp i64 %0 to double
%2 = fcmp ogt double %1, 0x43E0000000000000
%3 = fcmp olt double %fx.i, %1
%4 = or i1 %2, %3
%5 = fcmp oeq double %fx.i, %1
%6 = fptosi double %fx.i to i64
%7 = icmp sgt i64 %6, %0
%8 = and i1 %5, %7
%9 = or i1 %4, %8
ret i1 %9
}
define i1 @julia_foo1_21645(i64) {
top:
%1 = sitofp i64 %0 to double
%2 = fcmp olt double %1, 0x7FF0000000000000
%3 = fcmp oeq double %1, 0x7FF0000000000000
%4 = fcmp oeq double %1, 0x43E0000000000000
%5 = fptosi double %1 to i64
%6 = icmp sgt i64 %5, %0
%7 = or i1 %4, %6
%8 = and i1 %3, %7
%9 = or i1 %2, %8
ret i1 %9
}
define i1 @julia_foo2_21646(i64) {
top:
ret i1 true
} The output of the comparison is slightly different. Not sure if there was some bug fix or if this is the reason llvm doesn't optimize it. |
This can probably be combined with staged function to make the code more compact... |
@yuyichao The problem is that it doesn't optimise to the most efficient code when the integer part is constant (which was the reason for my original change): julia> ispos1(y) = 0<y; @code_llvm ispos1(1.0)
define i1 @julia_ispos1_20586(double) {
top:
%1 = fcmp ogt double %0, 0.000000e+00
ret i1 %1
}
julia> ispos2(y) = ltsif64(0,y); @code_llvm ispos2(1.0)
define i1 @julia_ispos2_20588(double) {
top:
%1 = fcmp ogt double %0, 0x43E0000000000000
%2 = fcmp ogt double %0, 0.000000e+00
%3 = or i1 %1, %2
ret i1 %3
} |
Please don't use a staged function for something like this!!
|
So it seems that LLVM does do range analysis for integer values, just not floating point values: julia> foo(x) = (x>1) | (x>3)
foo (generic function with 1 method)
julia> @code_llvm foo(1)
define i1 @julia_foo_20617(i64) {
top:
%1 = icmp sgt i64 %0, 1
ret i1 %1
}
julia> @code_llvm foo(1.0)
define i1 @julia_foo_20618(double) {
top:
%1 = fcmp ogt double %0, 1.000000e+00
%2 = fcmp ogt double %0, 3.000000e+00
%3 = or i1 %1, %2
ret i1 %3
} A possible JSOC/GSOC project? |
Actually what's the reason for doing those extra comparason in julia? Clang is just generating a simple cast and comparason for this. |
The main reason is so that all numeric comparisons are mathematically exact, e.g. julia> typemax(Int64) < Float64(typemax(Int64))
true |
Edit: Sorry I looked at the wrong version. |
Hmm, that's not what I'm getting: julia> @code_native foo3(1.0)
.section __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
pushq %rbp
movq %rsp, %rbp
xorpd %xmm1, %xmm1
ucomisd %xmm1, %xmm0
Source line: 1
seta %al
popq %rbp
ret
julia> @code_native foo4(1.0)
.section __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
pushq %rbp
movq %rsp, %rbp
xorpd %xmm1, %xmm1
ucomisd %xmm1, %xmm0
seta %cl
movabsq $13024516864, %rax ## imm = 0x308525B00
ucomisd (%rax), %xmm0
seta %al
orb %cl, %al
Source line: 1
popq %rbp
ret
julia> versioninfo()
Julia Version 0.4.0-dev+4921
Commit 6df5820* (2015-05-21 03:23 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin14.3.0)
CPU: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3 |
Summary: this is a job for LLVM, and most of the logic is already there. LLVM does some range analysis for integers but not floating-point. I prototyped a simple floating-point range analysis a while back but never pursued getting it into LLVM since there was a narrower solution for my issue. However, LLVM has a pass ["instruction combine"] that does peephole optimization of SSA expressions, and that would seem sufficient to deal with Simon's example. Given:
it should be straightforward to add rules to fold I looked at LLVM svn (and 3.3) and it looks like there is a routine in
Alas the routine punts if the integer is wider than the mantissa after conversion. We should be able to refine it to not punt if the constant has such a large magnitude that loss of precision in the input is irrelevant. |
I got the infinity case working with a two line change to |
Ah, very nice. Also,
On a different note: I didn't realise that there were also inbuilt optimisations for the LLVM math intrinsics ( function sinh(x::Float64)
y = ccall((:sinh,libm), Float64, (Float64,), x)
@llvm_assert CannotBeOrderedLessThanZero(y) = CannotBeOrderedLessThanZero(x)
return y
end |
I found out that LLVM supports integers with 2^23-1 bits, so even We really should be calling the LLVM intrinsics so we can exploit the inbuilt optimizations. I recall there was some issue having to do with resolving the intrinsics to Julia's libm during linking. We ought to get that fixed, or as a work-around, add a late pass that maps the intrinsics back to Julia's libm. |
I've posted a patch to the LLVM commits mailing list for review. @simonbyrne: it might be worth looking it over to check my logic, Even if you're not familiar with LLVM internals, I think the logic should be understandable. You'll probably need to create a Phabricator login to get at the review diff. I've put a copy of the diff here. |
I just got around to looking at this today: from what I can tell, it seems correct. Two ways it could be tightened further:
I don't know if these are worth the effort though. |
@simonbyrne : Thanks for the review. For sake of simplicity I'll skip the tightening. |
Fixed in LLVM trunk by revision r247708 on 2015-09-15, which committed the patch http://reviews.llvm.org/D11210. I won't close the issue until we decide whether to backport the fix to our patch for LLVM 3.3. |
It would be worth nominating that for 3.7.1 if it's non disruptive |
Fixed on master.
|
Any way we can test for this? |
We don't really have codegen tests, see #13777 for some relevant discussion. Adding it to Nanosoldier is probably the most straightforward. |
In 0.3:
In 0.4:
The difference is due to a change in #9030 which I made to allow for optimising expressions where the integer is constant, such as
x > 0
(which is probably more valuable). But is there a way we can have both?I assume that this will probably require some sort of range analysis pass at the LLVM level, but maybe someone can come up with a magic sequence that works.
This arose in #11977
The text was updated successfully, but these errors were encountered: