Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

float vs int: optimising away infinity comparisons #11987

Closed
simonbyrne opened this issue Jul 2, 2015 · 22 comments
Closed

float vs int: optimising away infinity comparisons #11987

simonbyrne opened this issue Jul 2, 2015 · 22 comments
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks

Comments

@simonbyrne
Copy link
Contributor

In 0.3:

julia> foo(x) = x < Inf; @code_llvm foo(1)

define i1 @julia_foo_20227(i64) {
top:
  ret i1 true, !dbg !761
}

In 0.4:

julia> foo(x) = x < Inf; @code_llvm foo(1)

define i1 @julia_foo_20585(i64) {
top:
  %1 = sitofp i64 %0 to double
  %2 = fcmp olt double %1, 0x7FF0000000000000
  %3 = fcmp oeq double %1, 0x7FF0000000000000
  %4 = fcmp oeq double %1, 0x43E0000000000000
  %5 = fptosi double %1 to i64
  %6 = icmp sgt i64 %5, %0
  %7 = or i1 %4, %6
  %8 = and i1 %3, %7
  %9 = or i1 %2, %8
  ret i1 %9
}

The difference is due to a change in #9030 which I made to allow for optimising expressions where the integer is constant, such as x > 0 (which is probably more valuable). But is there a way we can have both?

I assume that this will probably require some sort of range analysis pass at the LLVM level, but maybe someone can come up with a magic sequence that works.

This arose in #11977

@simonbyrne
Copy link
Contributor Author

For the record, the current function is (after macro expansion):

function lt_new(x::Int, y::Float64)
    fx = Float64(x)
    (fx < y) | ((fx == y) & (( fx == Float64(typemax(Int)) ) | (x < unsafe_trunc(Int, fx) ) ))
end

and the old one was effectively

function lt_old(x::Int, y::Float64)
    fx = Float64(x)
    (Float64(typemax(Int)) <= y) | (fx < y) | ((fx == y) & (x < unsafe_trunc(Int,fx)))
end

@simonbyrne simonbyrne added the performance Must go faster label Jul 2, 2015
@yuyichao
Copy link
Contributor

yuyichao commented Jul 2, 2015

Is some solution along this line (use llvmcall) acceptable?

@inline function ltsif64(x::Int64, y::Float64)
    Base.llvmcall("""
                  %fx = sitofp i64 %0 to double
                  %3 = fcmp ogt double %1, 9.223372036854776e18
                  %4 = fcmp olt double %fx, %1
                  %5 = or i1 %3, %4

                  %6 = fcmp oeq double %fx, %1
                  %7 = fptosi double %fx to i64
                  %8 = icmp slt i64 %0, %7
                  %9 = and i1 %6, %8

                  %10 = or i1 %5, %9
                  ret i1 %10
                  """, Bool, Tuple{Int64, Float64}, x, y)
end

println(ltsif64(1, 2.0))

foo1(x) = x < Inf
foo2(x) = ltsif64(x, Inf)

@code_llvm (<)(1, Inf)
@code_llvm ltsif64(1, Inf)

@code_llvm foo1(1)
@code_llvm foo2(1)

Output

true

define i1 @"julia_<_21642"(i64, double) {
top:
  %2 = sitofp i64 %0 to double
  %3 = fcmp olt double %2, %1
  %4 = fcmp oeq double %2, %1
  %5 = fcmp oeq double %2, 0x43E0000000000000
  %6 = fptosi double %2 to i64
  %7 = icmp sgt i64 %6, %0
  %8 = or i1 %5, %7
  %9 = and i1 %4, %8
  %10 = or i1 %3, %9
  ret i1 %10
}

define i1 @julia_ltsif64_21643(i64, double) {
top:
  %fx.i = sitofp i64 %0 to double
  %2 = fcmp ogt double %1, 0x43E0000000000000
  %3 = fcmp olt double %fx.i, %1
  %4 = or i1 %2, %3
  %5 = fcmp oeq double %fx.i, %1
  %6 = fptosi double %fx.i to i64
  %7 = icmp sgt i64 %6, %0
  %8 = and i1 %5, %7
  %9 = or i1 %4, %8
  ret i1 %9
}

define i1 @julia_foo1_21645(i64) {
top:
  %1 = sitofp i64 %0 to double
  %2 = fcmp olt double %1, 0x7FF0000000000000
  %3 = fcmp oeq double %1, 0x7FF0000000000000
  %4 = fcmp oeq double %1, 0x43E0000000000000
  %5 = fptosi double %1 to i64
  %6 = icmp sgt i64 %5, %0
  %7 = or i1 %4, %6
  %8 = and i1 %3, %7
  %9 = or i1 %2, %8
  ret i1 %9
}

define i1 @julia_foo2_21646(i64) {
top:
  ret i1 true
}

The output of the comparison is slightly different. Not sure if there was some bug fix or if this is the reason llvm doesn't optimize it.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 2, 2015

This can probably be combined with staged function to make the code more compact...

@simonbyrne
Copy link
Contributor Author

@yuyichao The problem is that it doesn't optimise to the most efficient code when the integer part is constant (which was the reason for my original change):

julia> ispos1(y) = 0<y; @code_llvm ispos1(1.0)

define i1 @julia_ispos1_20586(double) {
top:
  %1 = fcmp ogt double %0, 0.000000e+00
  ret i1 %1
}

julia> ispos2(y) = ltsif64(0,y); @code_llvm ispos2(1.0)

define i1 @julia_ispos2_20588(double) {
top:
  %1 = fcmp ogt double %0, 0x43E0000000000000
  %2 = fcmp ogt double %0, 0.000000e+00
  %3 = or i1 %1, %2
  ret i1 %3
}

@JeffBezanson
Copy link
Member

JeffBezanson commented Jul 2, 2015 via email

@simonbyrne
Copy link
Contributor Author

So it seems that LLVM does do range analysis for integer values, just not floating point values:

julia> foo(x) = (x>1) | (x>3)
foo (generic function with 1 method)

julia> @code_llvm foo(1)

define i1 @julia_foo_20617(i64) {
top:
  %1 = icmp sgt i64 %0, 1
  ret i1 %1
}

julia> @code_llvm foo(1.0)

define i1 @julia_foo_20618(double) {
top:
  %1 = fcmp ogt double %0, 1.000000e+00
  %2 = fcmp ogt double %0, 3.000000e+00
  %3 = or i1 %1, %2
  ret i1 %3
}

A possible JSOC/GSOC project?

@yuyichao
Copy link
Contributor

yuyichao commented Jul 2, 2015

Actually what's the reason for doing those extra comparason in julia? Clang is just generating a simple cast and comparason for this.

@simonbyrne
Copy link
Contributor Author

The main reason is so that all numeric comparisons are mathematically exact, e.g.

julia> typemax(Int64) < Float64(typemax(Int64))
true

@yuyichao
Copy link
Contributor

yuyichao commented Jul 2, 2015

Is LLVM doing that propagation at a lower level? The two version seems to be giving identical native code.

Edit: Sorry I looked at the wrong version.

@simonbyrne
Copy link
Contributor Author

Hmm, that's not what I'm getting:

julia> @code_native foo3(1.0)
    .section    __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
    pushq   %rbp
    movq    %rsp, %rbp
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
Source line: 1
    seta    %al
    popq    %rbp
    ret

julia> @code_native foo4(1.0)
    .section    __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
    pushq   %rbp
    movq    %rsp, %rbp
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    seta    %cl
    movabsq $13024516864, %rax      ## imm = 0x308525B00
    ucomisd (%rax), %xmm0
    seta    %al
    orb %cl, %al
Source line: 1
    popq    %rbp
    ret

julia> versioninfo()
Julia Version 0.4.0-dev+4921
Commit 6df5820* (2015-05-21 03:23 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.3.0)
  CPU: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@ArchRobison
Copy link
Contributor

Summary: this is a job for LLVM, and most of the logic is already there.

LLVM does some range analysis for integers but not floating-point. I prototyped a simple floating-point range analysis a while back but never pursued getting it into LLVM since there was a narrower solution for my issue.

However, LLVM has a pass ["instruction combine"] that does peephole optimization of SSA expressions, and that would seem sufficient to deal with Simon's example. Given:

define i1 @julia_foo_20585(i64) {
top:
  %1 = sitofp i64 %0 to double
  %2 = fcmp olt double %1, 0x7FF0000000000000
  %3 = fcmp oeq double %1, 0x7FF0000000000000
  ...
  %8 = and i1 %3, %7
  %9 = or i1 %2, %8
  ret i1 %9
}

it should be straightforward to add rules to fold %2 to true, which suffices to simplify %9 to true. Clearly the rule can be generalized to comparisons against any constant that does not fit in an i64. While patching LLVM, it's probably worth adding a rule to fold %3 to false too.

I looked at LLVM svn (and 3.3) and it looks like there is a routine in
lib/Transforms/InstCombineCompares.cpp to do this. Comments therein say:

/// FoldFCmp_IntToFP_Cst -  Fold fcmp ([us]itofp x,  cst) if possible

Alas the routine punts if the integer is wider than the mantissa after conversion. We should be able to refine it to not punt if the constant has such a large magnitude that loss of precision in the input is irrelevant.

@ArchRobison
Copy link
Contributor

I got the infinity case working with a two line change to lib/Transforms/InstCombineCompares.cpp. Though now I need to find out the right way to exclude the half-precision (16-bit float) case because some integers do convert to half-precision infinity.

@simonbyrne
Copy link
Contributor Author

Ah, very nice.

Also, UInt128 and Float32 will have the same problem:

julia> Float32(typemax(UInt128))
Inf32

On a different note: I didn't realise that there were also inbuilt optimisations for the LLVM math intrinsics (exp etc.), which is a pity, as we don't make use of all of them, instead calling the libm functions directly. It would be really neat if we could instead provide these sorts of hints at the Julia level, which would also allow us to apply these to our own functions, e.g. something like:

function sinh(x::Float64)
    y = ccall((:sinh,libm), Float64, (Float64,), x)
    @llvm_assert CannotBeOrderedLessThanZero(y) = CannotBeOrderedLessThanZero(x)
    return y
end

@JeffBezanson JeffBezanson added the compiler:codegen Generation of LLVM IR and native code label Jul 3, 2015
@ArchRobison
Copy link
Contributor

I found out that LLVM supports integers with 2^23-1 bits, so even Float64 could have a problem as far as LLVM is concerned. I'll work out the right conditions and run them by Simon for review.

We really should be calling the LLVM intrinsics so we can exploit the inbuilt optimizations. I recall there was some issue having to do with resolving the intrinsics to Julia's libm during linking. We ought to get that fixed, or as a work-around, add a late pass that maps the intrinsics back to Julia's libm.

@ArchRobison
Copy link
Contributor

I've posted a patch to the LLVM commits mailing list for review. @simonbyrne: it might be worth looking it over to check my logic, Even if you're not familiar with LLVM internals, I think the logic should be understandable. You'll probably need to create a Phabricator login to get at the review diff. I've put a copy of the diff here.

@simonbyrne
Copy link
Contributor Author

I just got around to looking at this today: from what I can tell, it seems correct. Two ways it could be tightened further:

  • depend on the comparison: e.g. Float64(x::Int64) <= 0x1p63 is always true, but Float64(x::Int64) < 0x1p63 is not; same for Float16(x::Int64) <= Inf16 vs Float16(x::Int64) < Inf16.
  • check if the mantissa is 1 when Exp == (int)InputSize - !LHSUnsigned).

I don't know if these are worth the effort though.

@ArchRobison
Copy link
Contributor

@simonbyrne : Thanks for the review. For sake of simplicity I'll skip the tightening.

@ArchRobison
Copy link
Contributor

Fixed in LLVM trunk by revision r247708 on 2015-09-15, which committed the patch http://reviews.llvm.org/D11210. I won't close the issue until we decide whether to backport the fix to our patch for LLVM 3.3.

@tkelman
Copy link
Contributor

tkelman commented Sep 16, 2015

It would be worth nominating that for 3.7.1 if it's non disruptive

@KristofferC
Copy link
Member

KristofferC commented Jan 22, 2017

Fixed on master.

julia> foo(x) = x < Inf; @code_llvm foo(1)

define i8 @julia_foo_65770(i64) #0 !dbg !5 {
top:
  ret i8 1
}

@simonbyrne
Copy link
Contributor Author

Any way we can test for this?

@KristofferC
Copy link
Member

We don't really have codegen tests, see #13777 for some relevant discussion. Adding it to Nanosoldier is probably the most straightforward.

@tkelman tkelman added the potential benchmark Could make a good benchmark in BaseBenchmarks label Jan 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks
Projects
None yet
Development

No branches or pull requests

6 participants