Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

zvookin · 2021-08-12T03:06:29Z

Time to get code review on this started as it will likely require more effort than average and the sooner this can get solid, the better.

The basic idea here is pretty simple: move codegen for parallel for loops and async chains from being done at the LLVM IR level to doing it at the Halide IR level. The direct goal is to allow using our concurrent runtime via the C++ backend. A secondary goal is to make the closure generation machinery reusable in other places.

This is all plumbed through but there are two issues that need to be addressed before it can be merged:

The method for determining whether to use the original halide_do_par_for depends on an LLVM codegen symbol table lookup that cannot work at the Halide IR level. I believe this should be resolvable by doing a walk of the Halide IR tree, but haven't yet implemented this. (I will do so shortly.)
Some tests use the call graph checker to validate a dependency graph of logical calls between Funcs. The checker does not recover the full set of dependencies across lowered functions when closures are made for parallel or async scheduled Funcs. @abadams and I will have to figure out how to handle this. For now the failing tests are disabled with a TODO. (correctness_func_wrapper, correctness_image_wrapper, and correctness_rfactor are the tests in question.)

Making this work required adding some intrinsics to declare and populate typed structures. The main technique here is that declaring a struct returns a properly typed null pointer to that struct and the fully elaborated backed type can be recovered from this null pointer. (In LLVM this is just using the getType() method on the Value and in C++ it is recovered using the decltype. This is to work around Halide's type system not being rich enough to handle everything in C++ (or LLVM for that matter). It might be possible to register more detailed type info for Handle types for a given compilation, but that seemed more complicated and likely requires registering dynamically created data to the compiler with an unknown lifetime, thus posing a memory management problem.

New intrinsics are make_struct_type, make_typed_struct, load_struct_member, resolve_function_name, get_user_context, and get_pointer_symbol_or_null. I will not defend this set of additions as a shining example of design, but it also certainly isn't lowering the bar for Halide intrinsics. Happy to consider alternative approaches if folks have ideas.

generating and calling, into a normal IR to IR lowering pass. Includes adding struct handling intrinsics to LLVM and C++ backends. Still a work in progress.

closure. Ensure all types are carried through exactly the same to both make_struct_type and make_typed_struct.

Fixes to call graph checker. Disable tests related to this while Andrew and I figure out how to get it to work across closures.

bludgeoning types into submission via subterfuge.

dsharletg

Just a quick first pass through this.

src/CodeGen_C.cpp

src/IR.cpp

src/LowerParallelTasks.cpp

alexreinking · 2021-08-12T07:59:30Z

Looks like you need to update src/CMakeLists.txt in the same way you updated the Makefile.

src/CodeGen_C.cpp

src/CodeGen_LLVM.cpp

LoweredFunctions in the future.

- normalize local functions to snake_case - put all local functions & classes in anon namespace - move MinThreads visitor to file scope to reduce nestedness of code

zvookin · 2021-12-02T22:24:12Z

I'm totally confused by the need to pass types as values here. I'm not sure why closure generation and passing would require that.

(The code is not literally passing types as values, rather it creates a value and recovers its type at another point in the generated code. Yes the value is only used to carry the type, but it is perfectly type safe and uses well established mechanisms in LLVM and C++. I want to make sure it is understood there is no weird casting between llvm::Type * and llvm::Value * or anything like that.)

Closure creation is effectively:

struct closure_struct_unique_name {
    // fields of this specific closure
};

int closure_callback_unique_name(void *closure_data_arg, ...) {
    closure_struct_unique_name *closure_data = (closure_struct_unique_name *)closure_data_arg;
    // unpack fields from closure_data.
    // ...
}

int some_halide_kernel(...) {
    closure_struct_unique_name closure_data = { ... };
    halide_do_par_for_or_something((void *)&closure_data, ...);
}

That needs to be modeled in Halide IR and codegened into LLVM IR and C++. I chose to do this by having a call to declare the type and return it as a typed pointer to that type. (One can declare just the type, and either with its fields or as a forward declaration to reference a preexisting type e.g. in the runtime. In that case a nullptr is returned. One can also declare the type as part of constructing a value of that type. Either value will serve to represent the type later.) Later codegen recovers the type from the pointer either using getType() in LLVM or decltype in C++. This is typesafe and allows adding other ways to produce such types in the future.

We also need to handle calling routines with pointer to structure type arguments, specifically in the runtime. This is transitive that in some cases, the pointer is inside another structure or array.

Other alternatives would be to binary pack everything and not use struct types at all. Except that we still need struct types when calling runtime routines that take a pointer to a specific structure type. The backend code needs to typecheck. So we have to pass the type somehow or recover it from the reference to the called runtime function. The latter is easy to do in LLVM, but very hard to do in C++.

In this alternative model, we almost certainly end up representing the struct type with a string that is its name. Though it probably is actually a string plus an int because it may be an array. This only works if the names are unique except not so unique that they don't refer back to the thing one wants them to when one wants them to. (E.g. the infamous C++ error messages about struct foo * not being type compatible with struct foo *. )

If LLVM were to completely eliminate types on pointers, it might be possible to just nerf this mechanism into non-existence for that backend and have things compile fine. I'll be a bit surprised if they remove pointer types to that degree, but if so, not a problem.

abadams · 2021-12-03T13:39:52Z

src/LowerParallelTasks.cpp

+ for (const auto &b : closure.buffers) {
+ Expr ptr_var = Variable::make(type_of<void *>(), b.first);
+ closure_elements.emplace_back(ptr_var);
+ closure_elements.emplace_back(Call::make(type_of<halide_buffer_t *>(), Call::get_pointer_symbol_or_null,


Why isn't this Variable::make(type_of<halide_buffer_t *>(), b.first + ".buffer")?

Why might the buffer exist sometimes and not exist other times, and it's OK if it doesn't exist?

If the buffer might not exist and that's OK for some reason, how about keeping a Scope<Expr> of all containing letstmts and using scope.contains(buffer_name) ? scope.get(buffer_name) : reinterpret(Handle(), 0)

abadams · 2021-12-03T13:40:19Z

src/LowerParallelTasks.cpp

+ // Allocate task list array
+ Expr tasks_list = Call::make(Handle(), Call::make_typed_struct, tasks_array_args, Call::PureIntrinsic);
+ Expr user_context = Call::make(type_of<void *>(), Call::get_user_context, {}, Call::PureIntrinsic);
+ Expr task_parent = Call::make(Handle(), Call::get_pointer_symbol_or_null,


Same comment applies about keeping a Scope<Expr> instead

abadams · 2021-12-03T13:45:32Z

What do you think about having a list of struct types inside the Module object instead of in the Stmt IR? A struct type would have a name and be either a primitive type or a vector of struct types. Then we'd have just the make_typed_struct and load_typed_struct_member instrinsics (or IR nodes) and they could refer to these global struct declarations by name.

The structs themselves would be passed around as Exprs of Handle() type. I think we should assume llvm will eventually remove all types from pointer values. That's the plan AFAIK, and the API reflects that. The loads and stores and GEPs are typed, but not the values themselves.

Finally, for function names as Handle()-valued Exprs, I think we should just use a Variable node with that name with type Handle(). These are symbols that are supposed to be in scope, so Variable is appropriate. At the start of codegenning a module we could add declarations for all the loweredfuncs in it and then add the corresponding names to the symbol table.

* Minor cleanup of parallel refactor intrinsics - Renamed `load_struct_member` to `load_typed_struct_member` to make it more clear that it is intended for use only with the results of `make_typed_struct`. - Split `declare_struct_type` into two intrinsics, `define_typed_struct` and `forward_declare_typed_struct`, removing the need for the underdocumented `mode` argument and hopefully making usage clearer - Added / clarified comments for the intrinsics modified above * Update comments * Fix comments

dsharletg · 2021-12-06T19:44:55Z

src/CodeGen_C.cpp

+ } else if (op->is_intrinsic(Call::define_typed_struct)) {
+ // Emit a declaration like:
+ //
+ // struct {const int f_0, const char f_1, const int f_2} foo;


Shouldn't this be:

struct foo {const int f_0; const char f_1; const int f_2;};

?

Yep, my bad, will fix

dsharletg · 2021-12-06T21:24:00Z

src/LowerParallelTasks.cpp

+ wrapped_body = LetStmt::make(b.first,
+ Call::make(type_of<void *>(), Call::load_struct_member,
+ {closure_arg, struct_type, make_const(UInt(32), struct_index)},
+ Call::PureIntrinsic),


steven-johnson · 2021-12-06T22:25:29Z

What do you think about having a list of struct types inside the Module object instead of in the Stmt IR?

My first thought is that it would add new management challenges to the IR; currently our IR does a good job of being standalone at any point in the tree, but this would introduce a class of IR nodes that could only be correctly used inside a properly-configured Module.

abadams · 2021-12-06T23:20:54Z

It's not without precedent: A Buffer object, which is referred to by name, can only be used inside a correctly configured Module. Modules are like C translation units.

This is a struct type that has to be shared between multiple LoweredFunc objects. It exists in the same container that holds those LoweredFuncs, not within the LoweredFuncs themselves. By way of analogy, in C if you have a struct type shared by multiple functions, you put that struct type declaration outside of the functions.

zvookin · 2021-12-07T00:28:49Z

I have no doubt there is a better design to add aggregate types to the IR, but I have trouble seeing how adding structs at the module level is any better than this current approach. E.g. at least this design prints out all the information in debug output. I also don't see a whole lot of cost in changing this later when we have a better design, thus it seems a good first step on the way to something else.

We kind of need this support to deliver certain things and it's been in code review since mid-August so if there are concrete objections to how it works I'd like to focus on those. Moving the struct types out a level could easily be done in a second pass, though if we're going to do that, I'd suggest actually adding scoping as well.

abadams · 2021-12-07T01:27:20Z

My concrete objections to the design are the declaration of types inside imperative Stmt IR, which doesn't mentally type check for me, and the use of StringImms to refer to things in some sort of notional scope, which has caused us considerable pain in the past because passes that track dependencies between bits of IR have to special-case the stringimms.

I want to add the minimum number of new intrinsics. I think we can do this with 2/3, not six.

abadams · 2021-12-07T13:25:29Z

If you don't want to put things in Module, then I woke up with some ideas that would meet my design constraints:

Make a StructType object, which is either a Type or a list of StructType objects, held behind an Intrusive Ptr. There's also an optional name field, which is empty for anonymous types.

Make new IR nodes MakeTypedStruct and LoadStructMember, both of which have StructType fields that give the type. MakeTypedStruct also has a vector of Expr. It is of type Handle(). LoadStructMember also takes a single int giving the index. It is of the type of that struct field.

Make a StructType object as above, and add it as an optionally-defined field to Type. There can exist Handle types with the StructType field defined.

Have the existing Intrinsic make_struct have the StructType field set on its Type. The args are just the members.
Add a new Intrinsic load_struct_member which takes something of type Handle, which must have the StructType field set on its Type, and an int that gives the index of the struct member.

With both of these options, you would discover the StructTypes while traversing the IR. In the backends you would add a declaration of them to the preamble as you discover them, if you haven't hit that StructType yet. We'd change the existing uses of the make_struct intrinsic accordingly (as a follow-up PR).

Both options avoid the StringImms and avoid an imperative piece of IR used to declare a type.

This is a direct adaptation of what #6481 does for the old LLVM-based code, and allows elimination of one use of `get_pointer_or_null()`. PR is meant to be merged into factor_parallel_codegen, not master.

zvookin · 2021-12-08T20:19:57Z

My position is that this PR is a meaningful improvement in that it moves a significant component of codegen from the LLVM level to the IR level and allows that to be reused across backends. The intrinsics to me are scaffolding and I am in agreement that they are suboptimal, but they are also fine for now. The question is whether the best path forward is to merge the PR and then work on moving the intrinsics to e.g. new IR nodes or if we hold it up for another couple months to do that. The bar for introducing new IR nodes is much higher as it introduces code changes in a lot more places. Removing intrinsics is pretty easy in comparison to adding/modifying/removing an IR node.

E.g. the get_symbol_pointer_or_null thing was to mirror code that was in the LLVM backend and I didn't want to change the behavior because I needed to get the whole thing working and passing tests. (The behavior was also used in a couple other places originally, one for user_context and the other for task_parent. Both of those had other solutions, but the LLVM code really did do a check on the scoped values to implement them.) It turns out the main reason this intrinsic was there is probably dead code anyway and hey, maybe that could have been figured out earlier, but I have enough trouble with not doing things because I'm trying for perfection rather than getting it done. Nuking get_symbol_pointer_or_null is great, but it would have been fine to do that after landing the PR too. (And probably more conservative in that if the .buffer maybe/maybe not code is there for a reason, it would give a place to drop back to without backing out this entire PR.)

I am happy to open issues and add TODOs and even to commit to doing the work to get rid of the intrinsics in some timeframe. But really I do not think they are enough of a problem to create a maintenance burden. Even the LLVM opaque pointer change implies mostly just removing code. (At the cost of losing type checking at the LLVM IR layer.) At worst it requires adding a bit of type info to the scope store.

abadams · 2021-12-09T01:55:36Z

I ran out of time today, but I'm working up one of my StructType proposals. We can take a look at it and see if it's better or worse.

…-needed get_pointer_or_symbol() intrinsic (#6486)

I think this is the right fix for this case; that said, I can't find a single instance in any of our test cases that actually triggers this.

* Rework some of parallel closure lowering to avoid some intrinsics This version relies more heavily on the existing make_struct, and puts function names in the Variable namespace as globals. Co-authored-by: Steven Johnson <srj@google.com>

steven-johnson

LGTM

steven-johnson · 2021-12-13T19:01:19Z

Ready to merge.

Handles need to be `auto *` for the previous PR to work properly. (This should be refactored more intelligently to reduce code reuse; this is just a quick-fix to unbreak.)

Z Stern added 21 commits July 22, 2021 14:57

First cut at factoring parallel task compilation, including closure

0b7bd25

generating and calling, into a normal IR to IR lowering pass. Includes adding struct handling intrinsics to LLVM and C++ backends. Still a work in progress.

Fix formating that got munged by emacs somehow.

e3a14b0

Merge branch 'master' into factor_parallel_codegen

e06a2a8

Checkpoint progress.

7e90c97

Small fixes.

c9ab059

Checkpoint progress.

a6d9f1d

Checkpoint preogress.

a2a92a4

Checkpoint progress.

8375961

Checkpoint progress. Debugging code will be removed.

fbb05ab

Try a fix for make_typed_struct in C++ codegen.

f8de53e

Another attempt to fix C++ codegen.

c5d902e

Another C codegen fix.

b1f8c6c

Checkpoint progress.

5eef291

Merge branch 'master' into factor_parallel_codegen

3391b6d

Use make_typed_struct rather than make_struct to construct

ade53a2

closure. Ensure all types are carried through exactly the same to both make_struct_type and make_typed_struct.

Checkpoint.

980c447

Uniqueify closure names because LLVM was doing that to function names.

7071d40

Small formatting cleanups.

9b5c399

Fixes to call graph checker. Disable tests related to this while Andrew and I figure out how to get it to work across closures.

Get generated C++ to compile via a combination of fixing types and

ffb2887

bludgeoning types into submission via subterfuge.

Typo fix to a typo fix.

91d7130

Merge branch 'master' into factor_parallel_codegen

6785903

dsharletg reviewed Aug 12, 2021

View reviewed changes

src/CodeGen_C.cpp Outdated Show resolved Hide resolved

src/IR.cpp Outdated Show resolved Hide resolved

src/LowerParallelTasks.cpp Show resolved Hide resolved

Restore inadvertently deleted code.

c711bb1

Z Stern added 2 commits August 12, 2021 04:27

Rename make_struct_type to declare_struct_type.

d6ac3e9

Add new file to CMake.

f157165

steven-johnson reviewed Aug 12, 2021

View reviewed changes

src/CodeGen_C.cpp Outdated Show resolved Hide resolved

steven-johnson reviewed Aug 12, 2021

View reviewed changes

src/CodeGen_LLVM.cpp Outdated Show resolved Hide resolved

Z Stern added 2 commits August 12, 2021 20:30

Add fixes for Hexagon offload and any passes that might add additional

3c33517

LoweredFunctions in the future.

Add comment with a bit of info for the future..

c4f517b

steven-johnson added 3 commits December 2, 2021 11:02

Minor hygiene in LowerParallelTasks

387c58f

- normalize local functions to snake_case - put all local functions & classes in anon namespace - move MinThreads visitor to file scope to reduce nestedness of code

Merge branch 'master' into factor_parallel_codegen

24a6eb2

use Closure::include

a1d267e

Switch to PureIntrinsics per review feedback.

c21e701

abadams reviewed Dec 3, 2021

View reviewed changes

dsharletg reviewed Dec 6, 2021

View reviewed changes

Update CodeGen_C.cpp

2923246

steven-johnson added 3 commits December 8, 2021 10:54

Merge branch 'master' into factor_parallel_codegen

5764d12

Remove 'foo.buffer' from Closure entirely

64db40f

This is a direct adaptation of what #6481 does for the old LLVM-based code, and allows elimination of one use of `get_pointer_or_null()`. PR is meant to be merged into factor_parallel_codegen, not master.

Update LowerParallelTasks.cpp

bdd7857

steven-johnson and others added 3 commits December 9, 2021 11:47

Keep track of task_parent inside LowerParallelTasks; remove no-longer…

5cf15ea

…-needed get_pointer_or_symbol() intrinsic (#6486)

Fix potential issue with additional LoweredFuncs (#6490)

57412df

I think this is the right fix for this case; that said, I can't find a single instance in any of our test cases that actually triggers this.

steven-johnson self-requested a review December 13, 2021 18:15

steven-johnson approved these changes Dec 13, 2021

View reviewed changes

steven-johnson merged commit 46d8ca8 into master Dec 14, 2021

steven-johnson deleted the factor_parallel_codegen branch December 14, 2021 00:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

zvookin commented Aug 12, 2021

dsharletg left a comment

alexreinking commented Aug 12, 2021

zvookin commented Dec 2, 2021

abadams Dec 3, 2021

abadams Dec 3, 2021

abadams commented Dec 3, 2021

dsharletg Dec 6, 2021

steven-johnson Dec 6, 2021

dsharletg Dec 6, 2021

steven-johnson commented Dec 6, 2021

abadams commented Dec 6, 2021

zvookin commented Dec 7, 2021 •

edited

Loading

abadams commented Dec 7, 2021

abadams commented Dec 7, 2021

zvookin commented Dec 8, 2021

abadams commented Dec 9, 2021

steven-johnson left a comment

steven-johnson commented Dec 13, 2021

Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

Conversation

zvookin commented Aug 12, 2021

dsharletg left a comment

Choose a reason for hiding this comment

alexreinking commented Aug 12, 2021

zvookin commented Dec 2, 2021

abadams Dec 3, 2021

Choose a reason for hiding this comment

abadams Dec 3, 2021

Choose a reason for hiding this comment

abadams commented Dec 3, 2021

dsharletg Dec 6, 2021

Choose a reason for hiding this comment

steven-johnson Dec 6, 2021

Choose a reason for hiding this comment

dsharletg Dec 6, 2021

Choose a reason for hiding this comment

steven-johnson commented Dec 6, 2021

abadams commented Dec 6, 2021

zvookin commented Dec 7, 2021 • edited Loading

abadams commented Dec 7, 2021

abadams commented Dec 7, 2021

zvookin commented Dec 8, 2021

abadams commented Dec 9, 2021

steven-johnson left a comment

Choose a reason for hiding this comment

steven-johnson commented Dec 13, 2021

zvookin commented Dec 7, 2021 •

edited

Loading