Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC/WIP] Leverage compiler-rt for builtins support of llvm intrinsics #17344

Closed
wants to merge 12 commits into from
Closed

[RFC/WIP] Leverage compiler-rt for builtins support of llvm intrinsics #17344

wants to merge 12 commits into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Jul 8, 2016

Note:

This feature will remain unavailable until we compile LLVM with cmake and bump our llvm version to 3.8 3.9

Reason

llvm will only build compiler-rt with cmake and turns building it with autotools explicitly off. We currently build llvm as a shlib and that only works in combination with cmake from version 3.8 upwards.

Compiler-rt (builtins) provides low-level support for llvm intrinsics that don't map to an appropriate cpu instructions.

builtins - a simple library that provides an implementation of the low-level target-specific hooks required by code generation and other runtime components. For example, when compiling for a 32-bit target, converting a double to a 64-bit unsigned integer is compiling into a runtime call to the "__fixunsdfdi" function. The builtins library provides optimized implementations of this and other low-level routines, either in target-independent C form, or as a heavily-optimized assembly.
builtins provides full support for the libgcc interfaces on supported targets and high performance hand tuned implementations of commonly used functions like __floatundidf in assembly that are dramatically faster than the libgcc implementations. It should be very easy to bring builtins to support a new target by adding the new routines needed by that target.

It provides fallbacks for llvm intrinsics on platfroms, where they don't map to an appropriate instruction. As an example we could use the cpu instructions to convert Float16 -> Float32 on any platform that supports f16c.

Caveats

  • Compiler-rt is build and distributed as a static library so we have to convert it into a shared library.
  • Tested under Linux x86_64 and the build system will need adjustments for other targets.
  • compiler-rt build system is quite special (see rusts build-system https://github.com/rust-lang/rust/blob/master/mk/rt.mk)
  • needs cmake and llvm 3.8

Todo

  • Buildsystem support for Windows/ARM/PPC/Mac
  • ~~ Enable building with LLVM_USE_CMAKE = 0~~
  • Support USE_SYSTEM_LLVM = 1
  • Build per default

@maleadt
Copy link
Member

maleadt commented Jul 8, 2016

Some timings on Float16 conversions, on current master:

julia> @benchmark Float16(1.0f0)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     4.304 ns (0.00% GC)
  median time:      4.312 ns (0.00% GC)
  mean time:        4.342 ns (0.00% GC)
  maximum time:     7.633 ns (0.00% GC)

julia> @benchmark Float16(1.0)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     4.306 ns (0.00% GC)
  median time:      4.313 ns (0.00% GC)
  mean time:        4.345 ns (0.00% GC)
  maximum time:     8.068 ns (0.00% GC)

julia> @benchmark Float32(Float16(1.0))
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     6.799 ns (0.00% GC)
  median time:      6.808 ns (0.00% GC)
  mean time:        7.015 ns (0.00% GC)
  maximum time:     12.528 ns (0.00% GC)

julia> @benchmark Float64(Float16(1.0))
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     6.787 ns (0.00% GC)
  median time:      6.792 ns (0.00% GC)
  mean time:        6.803 ns (0.00% GC)
  maximum time:     10.169 ns (0.00% GC)

julia> @benchmark Float16(1.0) + 1
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     6.799 ns (0.00% GC)
  median time:      6.808 ns (0.00% GC)
  mean time:        7.152 ns (0.00% GC)
  maximum time:     9.103 ns (0.00% GC)

And with this PR:

julia> @benchmark Float16(1.0f0)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     1.368 ns (0.00% GC)
  median time:      1.371 ns (0.00% GC)
  mean time:        1.373 ns (0.00% GC)
  maximum time:     3.727 ns (0.00% GC)

julia> @benchmark Float16(1.0)
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     1.370 ns (0.00% GC)
  median time:      1.373 ns (0.00% GC)
  mean time:        1.375 ns (0.00% GC)
  maximum time:     4.262 ns (0.00% GC)

julia> @benchmark Float32(Float16(1.0))
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     1.370 ns (0.00% GC)
  median time:      1.373 ns (0.00% GC)
  mean time:        1.375 ns (0.00% GC)
  maximum time:     2.876 ns (0.00% GC)

julia> @benchmark Float64(Float16(1.0))
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     1.369 ns (0.00% GC)
  median time:      1.372 ns (0.00% GC)
  mean time:        1.374 ns (0.00% GC)
  maximum time:     3.667 ns (0.00% GC)

julia> @benchmark Float16(1.0) + 1
BenchmarkTools.Trial: 
  samples:          10000
  evals/sample:     1000
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     1.367 ns (0.00% GC)
  median time:      1.371 ns (0.00% GC)
  mean time:        1.374 ns (0.00% GC)
  maximum time:     4.347 ns (0.00% GC)

@@ -14,7 +14,7 @@ const NaN = NaN64
## conversions to floating-point ##
convert(::Type{Float16}, x::Integer) = convert(Float16, convert(Float32,x))
for t in (Int8,Int16,Int32,Int64,Int128,UInt8,UInt16,UInt32,UInt64,UInt128)
@eval promote_rule(::Type{Float16}, ::Type{$t}) = Float32
@eval promote_rule(::Type{Float16}, ::Type{$t}) = Float16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a different PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I should have noted that this is #17297

@vchuravy
Copy link
Member Author

vchuravy commented Jul 9, 2016

@tkelman thanks for the review and I hope I addressed most of your comments.

One point that I would like to discuss is if we want to put the shared library version of compiler-rt to go into build_private_libdir, since we create it purely to be able for Julia to resolve symbols to it at runtime and there is no other consumer (clang and rust link to the static version) and we need to distribute it ourselves.

@tkelman
Copy link
Contributor

tkelman commented Jul 10, 2016

Why would we put it in build_private_libdir? That's not really on the search path, from what I can tell the only thing we put there is the system image.

@vchuravy
Copy link
Member Author

The place I would think is appropriate is lib/julia to emphasise that we
are the only consumer of that shared library.

On Sun, 10 Jul 2016, 11:23 Tony Kelman, notifications@github.com wrote:

Why would we put it in build_private_libdir? That's not really on the
search path, from what I can tell the only thing we put there is the system
image.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#17344 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAI3avfFAMs216vAF5OIfLKbleJXMgyzks5qUQ52gaJpZM4JIcMK
.

@tkelman
Copy link
Contributor

tkelman commented Jul 10, 2016

Shared libraries will be put in lib/julia in the install tree on unix.

vchuravy added 12 commits July 12, 2016 06:02
This allows us the more freely use llvm intrinsics and have fallbacks in
place for systems where they don't map to instructions.

Currently this requires building with a Make.user containing

```
override BUILD_COMPILER-RT = 1
override LLVM_USE_CMAKE = 1
override USE_LLVM_SHLIB = 0
```

TODO:

* Allow for a system installation
* Figure out what is needed to make this work without LLVM_USE_CMAKE
LLVM intrinsics either map to instructions or to functions in
compiler-rt. If we can't find a symbol look into a shared version
of compiler-rt and resolve the functions there.
We need a cmake build that also works with SHLIB=1 for compiler-rt
@vchuravy
Copy link
Member Author

I will leave this branch up for now, but I currently don't see a way of doing this properly (and not lose shlib support), without switching to 3.9

@@ -1,4 +1,4 @@
LLVM_VER = 3.7.1
LLVM_VER = 3.8.0
Copy link
Contributor

@tkelman tkelman Jul 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll probably do 3.8.1 some time not too long after branching for 0.6-dev, but it'll need CI preparation

edit: and checksum updates

@vchuravy
Copy link
Member Author

vchuravy commented Sep 2, 2016

For the build-system we could try to use the Makefile from https://github.com/ReservedField/arm-compiler-rt/ to make your lifes easier (not using cmake)

@vchuravy
Copy link
Member Author

vchuravy commented Oct 6, 2016

#18734 has the basic support for compiler-rt and I plan to add PR for full Float16 support after that got merged

@vchuravy vchuravy closed this Oct 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants