Skip to content

Commit

Permalink
Work around LLVM's dickish undefined constant folding behavior.
Browse files Browse the repository at this point in the history
LLVM's fptosi intrinsic is undefined for NaN, so LLVM obnoxiously and
pointlessly does different things when it gets NaN as a run-time value
than as a compile-time value. To avoid this shitty, pointless trap, we
have to avoid calling fptosi on NaN by introducing a branch into the
hashing function – even though for hashing, we don't care *what* value
is produced, just as long as it's consistent. Unfortunately, this
affects the performance of Float64 hashing pretty badly. I was not
able to figure out any way to recover this lost performance. LLVM
really needs to stop doing this.
  • Loading branch information
StefanKarpinski committed Apr 25, 2014
1 parent f791029 commit 3696968
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion base/hashing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@ end
## hashing small, built-in numeric types ##

hx(a::Uint64, b::Float64, h::Uint) = hash_uint((3a + reinterpret(Uint64,b)) - h)
const hx_NaN = hx(uint(0), NaN, uint(0))

hash(x::Uint64, h::Uint) = hx(x, float64(x), h)
hash(x::Int64, h::Uint) = hx(reinterpret(Uint64,x), float64(x), h)
hash(x::Float64, h::Uint) = hx(box(Uint64,fptosi(unbox(Float64,x))), ifelse(isnan(x), NaN, x), h)
hash(x::Float64, h::Uint) = isnan(x) ? (hx_NaN $ h) : hx(box(Uint64,fptosi(unbox(Float64,x))), x, h)

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 25, 2014

Member

Did using ifelse here not work?

This comment has been minimized.

Copy link
@StefanKarpinski

StefanKarpinski Apr 25, 2014

Author Member

There are a number of different ways to do it – an ifelse for each argument to hx is slower; a single ifelse for the whole thing is much slower since it calls the expensive hashing function twice. This was the best I could find.

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 25, 2014

Member

Why does it call hx twice?

This comment has been minimized.

Copy link
@StefanKarpinski

StefanKarpinski Apr 25, 2014

Author Member

Which version are you talking about? Here are some of the obvious ones:

isnan(x) ? (hx_NaN $ h) : hx(box(Uint64,fptosi(unbox(Float64,x))), x, h) # current
ifelse(isnan(x), hx_NaN $ h, hx(box(Uint64,fptosi(unbox(Float64,x))), x, h))
ifelse(isnan(x), hx(uint(0), NaN, h), hx(box(Uint64,fptosi(unbox(Float64,x))), x, h))

There are others. I tried everything I could think of. This one was the fastest. Not as fast as it used to be, but still only 2x slower than integer hashing.

This comment has been minimized.

Copy link
@JeffBezanson

JeffBezanson Apr 25, 2014

Member

It's strange that there would be a big difference between the first and second of those. Even stranger if the first one is faster for non-NaNs.


hash(x::Union(Int8,Uint8,Int16,Uint16,Int32,Uint32), h::Uint) = hash(int64(x), h)
hash(x::Float32, h::Uint) = hash(float64(x), h)
Expand Down

2 comments on commit 3696968

@ViralBShah
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would turning off some fast-math stuff in LLVM's codegen help?

@JeffBezanson
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so; the LLVM language spec says the answer is undefined period.

Please sign in to comment.