-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195
Conversation
generating and calling, into a normal IR to IR lowering pass. Includes adding struct handling intrinsics to LLVM and C++ backends. Still a work in progress.
closure. Ensure all types are carried through exactly the same to both make_struct_type and make_typed_struct.
Fixes to call graph checker. Disable tests related to this while Andrew and I figure out how to get it to work across closures.
bludgeoning types into submission via subterfuge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick first pass through this.
Looks like you need to update |
LoweredFunctions in the future.
- normalize local functions to snake_case - put all local functions & classes in anon namespace - move MinThreads visitor to file scope to reduce nestedness of code
(The code is not literally passing types as values, rather it creates a value and recovers its type at another point in the generated code. Yes the value is only used to carry the type, but it is perfectly type safe and uses well established mechanisms in LLVM and C++. I want to make sure it is understood there is no weird casting between Closure creation is effectively:
That needs to be modeled in Halide IR and codegened into LLVM IR and C++. I chose to do this by having a call to declare the type and return it as a typed pointer to that type. (One can declare just the type, and either with its fields or as a forward declaration to reference a preexisting type e.g. in the runtime. In that case a nullptr is returned. One can also declare the type as part of constructing a value of that type. Either value will serve to represent the type later.) Later codegen recovers the type from the pointer either using We also need to handle calling routines with pointer to structure type arguments, specifically in the runtime. This is transitive that in some cases, the pointer is inside another structure or array. Other alternatives would be to binary pack everything and not use struct types at all. Except that we still need struct types when calling runtime routines that take a pointer to a specific structure type. The backend code needs to typecheck. So we have to pass the type somehow or recover it from the reference to the called runtime function. The latter is easy to do in LLVM, but very hard to do in C++. In this alternative model, we almost certainly end up representing the struct type with a string that is its name. Though it probably is actually a string plus an int because it may be an array. This only works if the names are unique except not so unique that they don't refer back to the thing one wants them to when one wants them to. (E.g. the infamous C++ error messages about struct foo * not being type compatible with struct foo *. ) If LLVM were to completely eliminate types on pointers, it might be possible to just nerf this mechanism into non-existence for that backend and have things compile fine. I'll be a bit surprised if they remove pointer types to that degree, but if so, not a problem. |
src/LowerParallelTasks.cpp
Outdated
for (const auto &b : closure.buffers) { | ||
Expr ptr_var = Variable::make(type_of<void *>(), b.first); | ||
closure_elements.emplace_back(ptr_var); | ||
closure_elements.emplace_back(Call::make(type_of<halide_buffer_t *>(), Call::get_pointer_symbol_or_null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this Variable::make(type_of<halide_buffer_t *>(), b.first + ".buffer")
?
Why might the buffer exist sometimes and not exist other times, and it's OK if it doesn't exist?
If the buffer might not exist and that's OK for some reason, how about keeping a Scope<Expr>
of all containing letstmts and using scope.contains(buffer_name) ? scope.get(buffer_name) : reinterpret(Handle(), 0)
src/LowerParallelTasks.cpp
Outdated
// Allocate task list array | ||
Expr tasks_list = Call::make(Handle(), Call::make_typed_struct, tasks_array_args, Call::PureIntrinsic); | ||
Expr user_context = Call::make(type_of<void *>(), Call::get_user_context, {}, Call::PureIntrinsic); | ||
Expr task_parent = Call::make(Handle(), Call::get_pointer_symbol_or_null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment applies about keeping a Scope<Expr>
instead
What do you think about having a list of struct types inside the Module object instead of in the Stmt IR? A struct type would have a name and be either a primitive type or a vector of struct types. Then we'd have just the make_typed_struct and load_typed_struct_member instrinsics (or IR nodes) and they could refer to these global struct declarations by name. The structs themselves would be passed around as Exprs of Handle() type. I think we should assume llvm will eventually remove all types from pointer values. That's the plan AFAIK, and the API reflects that. The loads and stores and GEPs are typed, but not the values themselves. Finally, for function names as Handle()-valued Exprs, I think we should just use a Variable node with that name with type Handle(). These are symbols that are supposed to be in scope, so Variable is appropriate. At the start of codegenning a module we could add declarations for all the loweredfuncs in it and then add the corresponding names to the symbol table. |
* Minor cleanup of parallel refactor intrinsics - Renamed `load_struct_member` to `load_typed_struct_member` to make it more clear that it is intended for use only with the results of `make_typed_struct`. - Split `declare_struct_type` into two intrinsics, `define_typed_struct` and `forward_declare_typed_struct`, removing the need for the underdocumented `mode` argument and hopefully making usage clearer - Added / clarified comments for the intrinsics modified above * Update comments * Fix comments
src/CodeGen_C.cpp
Outdated
} else if (op->is_intrinsic(Call::define_typed_struct)) { | ||
// Emit a declaration like: | ||
// | ||
// struct {const int f_0, const char f_1, const int f_2} foo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be:
struct foo {const int f_0; const char f_1; const int f_2;};
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, my bad, will fix
src/LowerParallelTasks.cpp
Outdated
wrapped_body = LetStmt::make(b.first, | ||
Call::make(type_of<void *>(), Call::load_struct_member, | ||
{closure_arg, struct_type, make_const(UInt(32), struct_index)}, | ||
Call::PureIntrinsic), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved?
My first thought is that it would add new management challenges to the IR; currently our IR does a good job of being standalone at any point in the tree, but this would introduce a class of IR nodes that could only be correctly used inside a properly-configured Module. |
It's not without precedent: A Buffer object, which is referred to by name, can only be used inside a correctly configured Module. Modules are like C translation units. This is a struct type that has to be shared between multiple LoweredFunc objects. It exists in the same container that holds those LoweredFuncs, not within the LoweredFuncs themselves. By way of analogy, in C if you have a struct type shared by multiple functions, you put that struct type declaration outside of the functions. |
I have no doubt there is a better design to add aggregate types to the IR, but I have trouble seeing how adding structs at the module level is any better than this current approach. E.g. at least this design prints out all the information in debug output. I also don't see a whole lot of cost in changing this later when we have a better design, thus it seems a good first step on the way to something else. We kind of need this support to deliver certain things and it's been in code review since mid-August so if there are concrete objections to how it works I'd like to focus on those. Moving the struct types out a level could easily be done in a second pass, though if we're going to do that, I'd suggest actually adding scoping as well. |
My concrete objections to the design are the declaration of types inside imperative Stmt IR, which doesn't mentally type check for me, and the use of StringImms to refer to things in some sort of notional scope, which has caused us considerable pain in the past because passes that track dependencies between bits of IR have to special-case the stringimms. I want to add the minimum number of new intrinsics. I think we can do this with 2/3, not six. |
If you don't want to put things in Module, then I woke up with some ideas that would meet my design constraints: Make a StructType object, which is either a Type or a list of StructType objects, held behind an Intrusive Ptr. There's also an optional name field, which is empty for anonymous types. Make new IR nodes MakeTypedStruct and LoadStructMember, both of which have StructType fields that give the type. MakeTypedStruct also has a vector of Expr. It is of type Handle(). LoadStructMember also takes a single int giving the index. It is of the type of that struct field. Make a StructType object as above, and add it as an optionally-defined field to Type. There can exist Handle types with the StructType field defined. Have the existing Intrinsic make_struct have the StructType field set on its Type. The args are just the members. With both of these options, you would discover the StructTypes while traversing the IR. In the backends you would add a declaration of them to the preamble as you discover them, if you haven't hit that StructType yet. We'd change the existing uses of the make_struct intrinsic accordingly (as a follow-up PR). Both options avoid the StringImms and avoid an imperative piece of IR used to declare a type. |
This is a direct adaptation of what #6481 does for the old LLVM-based code, and allows elimination of one use of `get_pointer_or_null()`. PR is meant to be merged into factor_parallel_codegen, not master.
My position is that this PR is a meaningful improvement in that it moves a significant component of codegen from the LLVM level to the IR level and allows that to be reused across backends. The intrinsics to me are scaffolding and I am in agreement that they are suboptimal, but they are also fine for now. The question is whether the best path forward is to merge the PR and then work on moving the intrinsics to e.g. new IR nodes or if we hold it up for another couple months to do that. The bar for introducing new IR nodes is much higher as it introduces code changes in a lot more places. Removing intrinsics is pretty easy in comparison to adding/modifying/removing an IR node. E.g. the get_symbol_pointer_or_null thing was to mirror code that was in the LLVM backend and I didn't want to change the behavior because I needed to get the whole thing working and passing tests. (The behavior was also used in a couple other places originally, one for user_context and the other for task_parent. Both of those had other solutions, but the LLVM code really did do a check on the scoped values to implement them.) It turns out the main reason this intrinsic was there is probably dead code anyway and hey, maybe that could have been figured out earlier, but I have enough trouble with not doing things because I'm trying for perfection rather than getting it done. Nuking get_symbol_pointer_or_null is great, but it would have been fine to do that after landing the PR too. (And probably more conservative in that if the .buffer maybe/maybe not code is there for a reason, it would give a place to drop back to without backing out this entire PR.) I am happy to open issues and add TODOs and even to commit to doing the work to get rid of the intrinsics in some timeframe. But really I do not think they are enough of a problem to create a maintenance burden. Even the LLVM opaque pointer change implies mostly just removing code. (At the cost of losing type checking at the LLVM IR layer.) At worst it requires adding a bit of type info to the scope store. |
I ran out of time today, but I'm working up one of my StructType proposals. We can take a look at it and see if it's better or worse. |
…-needed get_pointer_or_symbol() intrinsic (#6486)
I think this is the right fix for this case; that said, I can't find a single instance in any of our test cases that actually triggers this.
* Rework some of parallel closure lowering to avoid some intrinsics This version relies more heavily on the existing make_struct, and puts function names in the Variable namespace as globals. Co-authored-by: Steven Johnson <srj@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ready to merge. |
Handles need to be `auto *` for the previous PR to work properly. (This should be refactored more intelligently to reduce code reuse; this is just a quick-fix to unbreak.)
Time to get code review on this started as it will likely require more effort than average and the sooner this can get solid, the better.
The basic idea here is pretty simple: move codegen for parallel for loops and async chains from being done at the LLVM IR level to doing it at the Halide IR level. The direct goal is to allow using our concurrent runtime via the C++ backend. A secondary goal is to make the closure generation machinery reusable in other places.
This is all plumbed through but there are two issues that need to be addressed before it can be merged:
The method for determining whether to use the original
halide_do_par_for
depends on an LLVM codegen symbol table lookup that cannot work at the Halide IR level. I believe this should be resolvable by doing a walk of the Halide IR tree, but haven't yet implemented this. (I will do so shortly.)Some tests use the call graph checker to validate a dependency graph of logical calls between Funcs. The checker does not recover the full set of dependencies across lowered functions when closures are made for parallel or async scheduled Funcs. @abadams and I will have to figure out how to handle this. For now the failing tests are disabled with a TODO. (correctness_func_wrapper, correctness_image_wrapper, and correctness_rfactor are the tests in question.)
Making this work required adding some intrinsics to declare and populate typed structures. The main technique here is that declaring a struct returns a properly typed null pointer to that struct and the fully elaborated backed type can be recovered from this null pointer. (In LLVM this is just using the
getType()
method on theValue
and in C++ it is recovered using thedecltype
. This is to work around Halide's type system not being rich enough to handle everything in C++ (or LLVM for that matter). It might be possible to register more detailed type info for Handle types for a given compilation, but that seemed more complicated and likely requires registering dynamically created data to the compiler with an unknown lifetime, thus posing a memory management problem.New intrinsics are
make_struct_type
,make_typed_struct
,load_struct_member
,resolve_function_name
,get_user_context
, andget_pointer_symbol_or_null
. I will not defend this set of additions as a shining example of design, but it also certainly isn't lowering the bar for Halide intrinsics. Happy to consider alternative approaches if folks have ideas.