float vs int: optimising away infinity comparisons #11987

simonbyrne · 2015-07-02T10:26:03Z

In 0.3:

julia> foo(x) = x < Inf; @code_llvm foo(1)

define i1 @julia_foo_20227(i64) {
top:
  ret i1 true, !dbg !761
}

In 0.4:

julia> foo(x) = x < Inf; @code_llvm foo(1)

define i1 @julia_foo_20585(i64) {
top:
  %1 = sitofp i64 %0 to double
  %2 = fcmp olt double %1, 0x7FF0000000000000
  %3 = fcmp oeq double %1, 0x7FF0000000000000
  %4 = fcmp oeq double %1, 0x43E0000000000000
  %5 = fptosi double %1 to i64
  %6 = icmp sgt i64 %5, %0
  %7 = or i1 %4, %6
  %8 = and i1 %3, %7
  %9 = or i1 %2, %8
  ret i1 %9
}

The difference is due to a change in #9030 which I made to allow for optimising expressions where the integer is constant, such as x > 0 (which is probably more valuable). But is there a way we can have both?

I assume that this will probably require some sort of range analysis pass at the LLVM level, but maybe someone can come up with a magic sequence that works.

This arose in #11977

The text was updated successfully, but these errors were encountered:

simonbyrne · 2015-07-02T10:30:28Z

For the record, the current function is (after macro expansion):

function lt_new(x::Int, y::Float64)
    fx = Float64(x)
    (fx < y) | ((fx == y) & (( fx == Float64(typemax(Int)) ) | (x < unsafe_trunc(Int, fx) ) ))
end

and the old one was effectively

function lt_old(x::Int, y::Float64)
    fx = Float64(x)
    (Float64(typemax(Int)) <= y) | (fx < y) | ((fx == y) & (x < unsafe_trunc(Int,fx)))
end

yuyichao · 2015-07-02T13:07:40Z

Is some solution along this line (use llvmcall) acceptable?

@inline function ltsif64(x::Int64, y::Float64)
    Base.llvmcall("""
                  %fx = sitofp i64 %0 to double
                  %3 = fcmp ogt double %1, 9.223372036854776e18
                  %4 = fcmp olt double %fx, %1
                  %5 = or i1 %3, %4

                  %6 = fcmp oeq double %fx, %1
                  %7 = fptosi double %fx to i64
                  %8 = icmp slt i64 %0, %7
                  %9 = and i1 %6, %8

                  %10 = or i1 %5, %9
                  ret i1 %10
                  """, Bool, Tuple{Int64, Float64}, x, y)
end

println(ltsif64(1, 2.0))

foo1(x) = x < Inf
foo2(x) = ltsif64(x, Inf)

@code_llvm (<)(1, Inf)
@code_llvm ltsif64(1, Inf)

@code_llvm foo1(1)
@code_llvm foo2(1)

Output

true

define i1 @"julia_<_21642"(i64, double) {
top:
  %2 = sitofp i64 %0 to double
  %3 = fcmp olt double %2, %1
  %4 = fcmp oeq double %2, %1
  %5 = fcmp oeq double %2, 0x43E0000000000000
  %6 = fptosi double %2 to i64
  %7 = icmp sgt i64 %6, %0
  %8 = or i1 %5, %7
  %9 = and i1 %4, %8
  %10 = or i1 %3, %9
  ret i1 %10
}

define i1 @julia_ltsif64_21643(i64, double) {
top:
  %fx.i = sitofp i64 %0 to double
  %2 = fcmp ogt double %1, 0x43E0000000000000
  %3 = fcmp olt double %fx.i, %1
  %4 = or i1 %2, %3
  %5 = fcmp oeq double %fx.i, %1
  %6 = fptosi double %fx.i to i64
  %7 = icmp sgt i64 %6, %0
  %8 = and i1 %5, %7
  %9 = or i1 %4, %8
  ret i1 %9
}

define i1 @julia_foo1_21645(i64) {
top:
  %1 = sitofp i64 %0 to double
  %2 = fcmp olt double %1, 0x7FF0000000000000
  %3 = fcmp oeq double %1, 0x7FF0000000000000
  %4 = fcmp oeq double %1, 0x43E0000000000000
  %5 = fptosi double %1 to i64
  %6 = icmp sgt i64 %5, %0
  %7 = or i1 %4, %6
  %8 = and i1 %3, %7
  %9 = or i1 %2, %8
  ret i1 %9
}

define i1 @julia_foo2_21646(i64) {
top:
  ret i1 true
}

The output of the comparison is slightly different. Not sure if there was some bug fix or if this is the reason llvm doesn't optimize it.

yuyichao · 2015-07-02T13:09:51Z

This can probably be combined with staged function to make the code more compact...

simonbyrne · 2015-07-02T13:13:35Z

@yuyichao The problem is that it doesn't optimise to the most efficient code when the integer part is constant (which was the reason for my original change):

julia> ispos1(y) = 0<y; @code_llvm ispos1(1.0)

define i1 @julia_ispos1_20586(double) {
top:
  %1 = fcmp ogt double %0, 0.000000e+00
  ret i1 %1
}

julia> ispos2(y) = ltsif64(0,y); @code_llvm ispos2(1.0)

define i1 @julia_ispos2_20588(double) {
top:
  %1 = fcmp ogt double %0, 0x43E0000000000000
  %2 = fcmp ogt double %0, 0.000000e+00
  %3 = or i1 %1, %2
  ret i1 %3
}

JeffBezanson · 2015-07-02T13:15:10Z

Please don't use a staged function for something like this!!

simonbyrne · 2015-07-02T13:17:58Z

So it seems that LLVM does do range analysis for integer values, just not floating point values:

julia> foo(x) = (x>1) | (x>3)
foo (generic function with 1 method)

julia> @code_llvm foo(1)

define i1 @julia_foo_20617(i64) {
top:
  %1 = icmp sgt i64 %0, 1
  ret i1 %1
}

julia> @code_llvm foo(1.0)

define i1 @julia_foo_20618(double) {
top:
  %1 = fcmp ogt double %0, 1.000000e+00
  %2 = fcmp ogt double %0, 3.000000e+00
  %3 = or i1 %1, %2
  ret i1 %3
}

A possible JSOC/GSOC project?

yuyichao · 2015-07-02T13:24:14Z

Actually what's the reason for doing those extra comparason in julia? Clang is just generating a simple cast and comparason for this.

simonbyrne · 2015-07-02T13:26:17Z

The main reason is so that all numeric comparisons are mathematically exact, e.g.

julia> typemax(Int64) < Float64(typemax(Int64))
true

yuyichao · 2015-07-02T13:29:38Z

~~Is LLVM doing that propagation at a lower level? The two version seems to be giving identical native code.~~

Edit: Sorry I looked at the wrong version.

simonbyrne · 2015-07-02T13:34:12Z

Hmm, that's not what I'm getting:

julia> @code_native foo3(1.0)
    .section    __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
    pushq   %rbp
    movq    %rsp, %rbp
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
Source line: 1
    seta    %al
    popq    %rbp
    ret

julia> @code_native foo4(1.0)
    .section    __TEXT,__text,regular,pure_instructions
Filename: none
Source line: 1
    pushq   %rbp
    movq    %rsp, %rbp
    xorpd   %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    seta    %cl
    movabsq $13024516864, %rax      ## imm = 0x308525B00
    ucomisd (%rax), %xmm0
    seta    %al
    orb %cl, %al
Source line: 1
    popq    %rbp
    ret

julia> versioninfo()
Julia Version 0.4.0-dev+4921
Commit 6df5820* (2015-05-21 03:23 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.3.0)
  CPU: Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

ArchRobison · 2015-07-03T03:21:06Z

Summary: this is a job for LLVM, and most of the logic is already there.

LLVM does some range analysis for integers but not floating-point. I prototyped a simple floating-point range analysis a while back but never pursued getting it into LLVM since there was a narrower solution for my issue.

However, LLVM has a pass ["instruction combine"] that does peephole optimization of SSA expressions, and that would seem sufficient to deal with Simon's example. Given:

define i1 @julia_foo_20585(i64) {
top:
  %1 = sitofp i64 %0 to double
  %2 = fcmp olt double %1, 0x7FF0000000000000
  %3 = fcmp oeq double %1, 0x7FF0000000000000
  ...
  %8 = and i1 %3, %7
  %9 = or i1 %2, %8
  ret i1 %9
}

it should be straightforward to add rules to fold %2 to true, which suffices to simplify %9 to true. Clearly the rule can be generalized to comparisons against any constant that does not fit in an i64. While patching LLVM, it's probably worth adding a rule to fold %3 to false too.

I looked at LLVM svn (and 3.3) and it looks like there is a routine in
lib/Transforms/InstCombineCompares.cpp to do this. Comments therein say:

/// FoldFCmp_IntToFP_Cst -  Fold fcmp ([us]itofp x,  cst) if possible

Alas the routine punts if the integer is wider than the mantissa after conversion. We should be able to refine it to not punt if the constant has such a large magnitude that loss of precision in the input is irrelevant.

ArchRobison · 2015-07-03T04:17:04Z

I got the infinity case working with a two line change to lib/Transforms/InstCombineCompares.cpp. Though now I need to find out the right way to exclude the half-precision (16-bit float) case because some integers do convert to half-precision infinity.

simonbyrne · 2015-07-03T09:46:16Z

Ah, very nice.

Also, UInt128 and Float32 will have the same problem:

julia> Float32(typemax(UInt128))
Inf32

On a different note: I didn't realise that there were also inbuilt optimisations for the LLVM math intrinsics (exp etc.), which is a pity, as we don't make use of all of them, instead calling the libm functions directly. It would be really neat if we could instead provide these sorts of hints at the Julia level, which would also allow us to apply these to our own functions, e.g. something like:

function sinh(x::Float64)
    y = ccall((:sinh,libm), Float64, (Float64,), x)
    @llvm_assert CannotBeOrderedLessThanZero(y) = CannotBeOrderedLessThanZero(x)
    return y
end

ArchRobison · 2015-07-04T20:33:29Z

I found out that LLVM supports integers with 2^23-1 bits, so even Float64 could have a problem as far as LLVM is concerned. I'll work out the right conditions and run them by Simon for review.

We really should be calling the LLVM intrinsics so we can exploit the inbuilt optimizations. I recall there was some issue having to do with resolving the intrinsics to Julia's libm during linking. We ought to get that fixed, or as a work-around, add a late pass that maps the intrinsics back to Julia's libm.

ArchRobison · 2015-07-15T02:04:57Z

I've posted a patch to the LLVM commits mailing list for review. @simonbyrne: it might be worth looking it over to check my logic, Even if you're not familiar with LLVM internals, I think the logic should be understandable. You'll probably need to create a Phabricator login to get at the review diff. I've put a copy of the diff here.

simonbyrne · 2015-07-23T20:59:57Z

I just got around to looking at this today: from what I can tell, it seems correct. Two ways it could be tightened further:

depend on the comparison: e.g. Float64(x::Int64) <= 0x1p63 is always true, but Float64(x::Int64) < 0x1p63 is not; same for Float16(x::Int64) <= Inf16 vs Float16(x::Int64) < Inf16.
check if the mantissa is 1 when Exp == (int)InputSize - !LHSUnsigned).

I don't know if these are worth the effort though.

ArchRobison · 2015-07-24T18:40:50Z

@simonbyrne : Thanks for the review. For sake of simplicity I'll skip the tightening.

ArchRobison · 2015-09-16T22:20:27Z

Fixed in LLVM trunk by revision r247708 on 2015-09-15, which committed the patch http://reviews.llvm.org/D11210. I won't close the issue until we decide whether to backport the fix to our patch for LLVM 3.3.

tkelman · 2015-09-16T22:45:30Z

It would be worth nominating that for 3.7.1 if it's non disruptive

KristofferC · 2017-01-22T13:34:37Z

Fixed on master.

julia> foo(x) = x < Inf; @code_llvm foo(1)

define i8 @julia_foo_65770(i64) #0 !dbg !5 {
top:
  ret i8 1
}

simonbyrne · 2017-01-22T13:38:24Z

Any way we can test for this?

KristofferC · 2017-01-22T14:00:00Z

We don't really have codegen tests, see #13777 for some relevant discussion. Adding it to Nanosoldier is probably the most straightforward.

simonbyrne added the performance Must go faster label Jul 2, 2015

JeffBezanson added the compiler:codegen Generation of LLVM IR and native code label Jul 3, 2015

simonbyrne added the Fixed in LLVM 3.8 label Jan 21, 2016

KristofferC closed this as completed Jan 22, 2017

tkelman added the potential benchmark Could make a good benchmark in BaseBenchmarks label Jan 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float vs int: optimising away infinity comparisons #11987

float vs int: optimising away infinity comparisons #11987

simonbyrne commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

yuyichao commented Jul 2, 2015

yuyichao commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

JeffBezanson commented Jul 2, 2015 via email

simonbyrne commented Jul 2, 2015

yuyichao commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

yuyichao commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

ArchRobison commented Jul 3, 2015

ArchRobison commented Jul 3, 2015

simonbyrne commented Jul 3, 2015

ArchRobison commented Jul 4, 2015

ArchRobison commented Jul 15, 2015

simonbyrne commented Jul 23, 2015

ArchRobison commented Jul 24, 2015

ArchRobison commented Sep 16, 2015

tkelman commented Sep 16, 2015

KristofferC commented Jan 22, 2017 •

edited

Loading

simonbyrne commented Jan 22, 2017

KristofferC commented Jan 22, 2017

float vs int: optimising away infinity comparisons #11987

float vs int: optimising away infinity comparisons #11987

Comments

simonbyrne commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

yuyichao commented Jul 2, 2015

yuyichao commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

JeffBezanson commented Jul 2, 2015 via email

simonbyrne commented Jul 2, 2015

yuyichao commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

yuyichao commented Jul 2, 2015

simonbyrne commented Jul 2, 2015

ArchRobison commented Jul 3, 2015

ArchRobison commented Jul 3, 2015

simonbyrne commented Jul 3, 2015

ArchRobison commented Jul 4, 2015

ArchRobison commented Jul 15, 2015

simonbyrne commented Jul 23, 2015

ArchRobison commented Jul 24, 2015

ArchRobison commented Sep 16, 2015

tkelman commented Sep 16, 2015

KristofferC commented Jan 22, 2017 • edited Loading

simonbyrne commented Jan 22, 2017

KristofferC commented Jan 22, 2017

KristofferC commented Jan 22, 2017 •

edited

Loading