Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental Tapir support #31086

Closed
wants to merge 21 commits into from
Closed

Experimental Tapir support #31086

wants to merge 21 commits into from

Conversation

vchuravy
Copy link
Member

@vchuravy vchuravy commented Feb 15, 2019

Introduction -- What is Tapir

Tapir is a parallel IR extension to LLVM. For the interested I recommend
perusing the Tapir paper. The key takeaway is that parallel (non-concurrent) programs, can be effectively model with cilk-style task parallelism and that given the serial-projection property (serial execution is always a valid execution), it is possible to reason about parallelism in the LLVM compiler.

By doing so Tapir solves one primary problem: Traditionally introducing parallelism into a program, inhibits compiler optimisations. This is due to a variety of reasons, but chiefly that most implementations of parallelism choose to do early-outlining of parallel thunks. Causing the optimizer to only see calls into the runtime/program thunks without context. A classical optimisation that is inhibited by this is loop-invariant-code-movement. In Julia we encounter a different problem (#15276) in which using a closure to outline a thunk can cause performance issues.

Tapir concepts

Syncregion

An opaque token that is used to associate the various parallel IR statements with each other, so that during sync only synchronizes tasks that it is responsible for. Important for nested parallelism and inlining of functions containing parallel constructs.

Detach

Think of this as a "function call" to the parallel region.
detach within %syncregion, %label, %reattach. The label points to the basic-block that starts-off the parallel region and the reattach label points past a reattach statement and represents the execution on the task that is spawning the parallel region.

Reattach

This is the "return" of a parallel region. It reattaches the parallel region to the original code and the label should point to the same basic-block that the reattach label in detach is pointing to.

Sync

Synchronises all tasks with the same syncregion

Goal of this PR

This is very much ongoing research on how to best integrate the ideas from Tapir and the technology behind it into Julia. I want to lay a foundation on which we can build and experiment in the future. While the full-benefits will only be realised if one uses a Tapir enabled LLVM build, one
of my goals is to bring the concepts of tapir into the Julia IR and thereby enable us to do optimizations on parallel code in the Julia IR even on a LLVM that doesn't have the Tapir extension. Right now we are in the very early stages of supporting Tapir in Julia.

It is important to note that the semantics of this representation are parallel and not concurrent,
by this extent this will not and cannot replace Julia Tasks. In order to exemplify this issue see the following Julia task code:

@sync begin
    ch1 = Channel(0)
    ch2 = Channel(0)
    @async begin
        take!(ch1)
        put!(ch2, 1)
    end
    @async begin
        put!(ch1, 1)
        take!(ch2)
    end
end

Doing a serial projection of this code leads to a deadlock.

User interface

In test/tapir.jl I have placed some functions that I have been experimenting with. I do not expect users to directly use @syncregion, @spawn and @sync_end, but rather I think the prototype implementation of a parallel for loop and @sync, @spawn.

@par for i in 1:10
    ...
end

function fib(N)
    if N <= 1
        return N
    end
    x = Ref{Int64}()
    @sync begin # different sync than Tasks
        @spawn begin
            x[] = fib(N-2)
        end
        y = fib(N-1)
    end
    return x[] + y
end

Changes/Current Status

  • Buildsystem support for Tapir/LLVM
  • New expr nodes:
    • syncregion: Obtain a token to synchronize spawned tasks
    • spawn: Spawn a block in a task
    • sync: Synchronize all tasks using the same token
  • New IR nodes:
    • detach: Detach a parallel region
    • reattach: Join a parallel region
  • Codegen support for syncregion, detach, reattach, sync

Examples

TODO:

  • loop information
  • tests!!!
  • fib2
  • early lowering (in codegen) to PARTR
  • late lowering as a LLVM pass to PARTR
  • runtime support for GC/PTLS
  • interpreter
  • cleanup PR

Notes

Make.user

LLVM_VER=svn
USE_TAPIR=1
BUILD_LLVM_CLANG=1
LLVM_GIT_VER="WIP-taskinfo"
LLVM_GIT_VER_CLANG="WIP-csi-tapir-exceptions"
LLVM_GIT_VER_COMPILER_RT="WIP-cilksan-bugfixes"
override CC=gcc-7
override CXX=g++-7

Acknowledgments

Many thanks to T.B. Schardl (@neboat) for the many discussions around Tapir and LLVM.

@vchuravy vchuravy force-pushed the vc/tapir2 branch 2 times, most recently from 4156e00 to 1755d4c Compare February 16, 2019 04:11
@vchuravy
Copy link
Member Author

Some fun numbers with the fib example. (Note that the overhead of setting up the tasks is the main cost here, serial runtime without tasks is 0.41s, the same version with Julia tasks OOMs my machine)

function fib(N)
    if N <= 1
        return N
    end
    token = @syncregion()
    x1 = Ref{Int64}()
    @spawn token begin
        x1[]  = fib(N-1)
    end
    x2 = fib(N-2)
    @sync_end token
    return x1[] + x2
end

1 Workers

julia> @time fib(40)
  4.883457 seconds (5.16 k allocations: 384.174 KiB)

2 Workers (Note my machine has 2 Cores, SMT-2)

julia> @time fib(40)
  2.448542 seconds (5.16 k allocations: 384.174 KiB)
102334155

4 Workers (Note my machine has 2 Cores, SMT-2)

julia> @time fib(40)
  1.952545 seconds (5.16 k allocations: 384.174 KiB)
102334155

@datnamer
Copy link

How is this positioned with regards to partr?

@StefanKarpinski
Copy link
Member

Technically, it's independent of partr. It does impact considerations for the design of the threading API, however, so there's some interaction there. Still mostly independent though.

@c42f
Copy link
Member

c42f commented Feb 26, 2019

This looks really interesting. How does it relate to the structured concurrency ideas expressed in Trio and libdill et al.? (Described, for example in https://trio.discourse.group/t/structured-concurrency-resources/21 and https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful)

@vchuravy
Copy link
Member Author

Superseded by #39773

@vchuravy vchuravy closed this Feb 21, 2021
@DilumAluthge DilumAluthge deleted the vc/tapir2 branch February 28, 2021 05:53
@DilumAluthge DilumAluthge added experimental multithreading Base.Threads and related functionality parallelism Parallel or distributed computation labels Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental multithreading Base.Threads and related functionality parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants