Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

Merged
merged 85 commits into from
Dec 14, 2021

Conversation

zvookin
Copy link
Member

@zvookin zvookin commented Aug 12, 2021

Time to get code review on this started as it will likely require more effort than average and the sooner this can get solid, the better.

The basic idea here is pretty simple: move codegen for parallel for loops and async chains from being done at the LLVM IR level to doing it at the Halide IR level. The direct goal is to allow using our concurrent runtime via the C++ backend. A secondary goal is to make the closure generation machinery reusable in other places.

This is all plumbed through but there are two issues that need to be addressed before it can be merged:

  1. The method for determining whether to use the original halide_do_par_for depends on an LLVM codegen symbol table lookup that cannot work at the Halide IR level. I believe this should be resolvable by doing a walk of the Halide IR tree, but haven't yet implemented this. (I will do so shortly.)

  2. Some tests use the call graph checker to validate a dependency graph of logical calls between Funcs. The checker does not recover the full set of dependencies across lowered functions when closures are made for parallel or async scheduled Funcs. @abadams and I will have to figure out how to handle this. For now the failing tests are disabled with a TODO. (correctness_func_wrapper, correctness_image_wrapper, and correctness_rfactor are the tests in question.)

Making this work required adding some intrinsics to declare and populate typed structures. The main technique here is that declaring a struct returns a properly typed null pointer to that struct and the fully elaborated backed type can be recovered from this null pointer. (In LLVM this is just using the getType() method on the Value and in C++ it is recovered using the decltype. This is to work around Halide's type system not being rich enough to handle everything in C++ (or LLVM for that matter). It might be possible to register more detailed type info for Handle types for a given compilation, but that seemed more complicated and likely requires registering dynamically created data to the compiler with an unknown lifetime, thus posing a memory management problem.

New intrinsics are make_struct_type, make_typed_struct, load_struct_member, resolve_function_name, get_user_context, and get_pointer_symbol_or_null. I will not defend this set of additions as a shining example of design, but it also certainly isn't lowering the bar for Halide intrinsics. Happy to consider alternative approaches if folks have ideas.

Copy link
Contributor

@dsharletg dsharletg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick first pass through this.

src/CodeGen_C.cpp Outdated Show resolved Hide resolved
src/IR.cpp Outdated Show resolved Hide resolved
src/LowerParallelTasks.cpp Show resolved Hide resolved
@alexreinking
Copy link
Member

Looks like you need to update src/CMakeLists.txt in the same way you updated the Makefile.

src/CodeGen_C.cpp Outdated Show resolved Hide resolved
- normalize local functions to snake_case
- put all local functions & classes in anon namespace
- move MinThreads visitor to file scope to reduce nestedness of code
@zvookin
Copy link
Member Author

zvookin commented Dec 2, 2021

I'm totally confused by the need to pass types as values here. I'm not sure why closure generation and passing would require that.

(The code is not literally passing types as values, rather it creates a value and recovers its type at another point in the generated code. Yes the value is only used to carry the type, but it is perfectly type safe and uses well established mechanisms in LLVM and C++. I want to make sure it is understood there is no weird casting between llvm::Type * and llvm::Value * or anything like that.)

Closure creation is effectively:

struct closure_struct_unique_name {
    // fields of this specific closure
};

int closure_callback_unique_name(void *closure_data_arg, ...) {
    closure_struct_unique_name *closure_data = (closure_struct_unique_name *)closure_data_arg;
    // unpack fields from closure_data.
    // ...
}

int some_halide_kernel(...) {
    closure_struct_unique_name closure_data = { ... };
    halide_do_par_for_or_something((void *)&closure_data, ...);
}

That needs to be modeled in Halide IR and codegened into LLVM IR and C++. I chose to do this by having a call to declare the type and return it as a typed pointer to that type. (One can declare just the type, and either with its fields or as a forward declaration to reference a preexisting type e.g. in the runtime. In that case a nullptr is returned. One can also declare the type as part of constructing a value of that type. Either value will serve to represent the type later.) Later codegen recovers the type from the pointer either using getType() in LLVM or decltype in C++. This is typesafe and allows adding other ways to produce such types in the future.

We also need to handle calling routines with pointer to structure type arguments, specifically in the runtime. This is transitive that in some cases, the pointer is inside another structure or array.

Other alternatives would be to binary pack everything and not use struct types at all. Except that we still need struct types when calling runtime routines that take a pointer to a specific structure type. The backend code needs to typecheck. So we have to pass the type somehow or recover it from the reference to the called runtime function. The latter is easy to do in LLVM, but very hard to do in C++.

In this alternative model, we almost certainly end up representing the struct type with a string that is its name. Though it probably is actually a string plus an int because it may be an array. This only works if the names are unique except not so unique that they don't refer back to the thing one wants them to when one wants them to. (E.g. the infamous C++ error messages about struct foo * not being type compatible with struct foo *. )

If LLVM were to completely eliminate types on pointers, it might be possible to just nerf this mechanism into non-existence for that backend and have things compile fine. I'll be a bit surprised if they remove pointer types to that degree, but if so, not a problem.

for (const auto &b : closure.buffers) {
Expr ptr_var = Variable::make(type_of<void *>(), b.first);
closure_elements.emplace_back(ptr_var);
closure_elements.emplace_back(Call::make(type_of<halide_buffer_t *>(), Call::get_pointer_symbol_or_null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this Variable::make(type_of<halide_buffer_t *>(), b.first + ".buffer")?

Why might the buffer exist sometimes and not exist other times, and it's OK if it doesn't exist?

If the buffer might not exist and that's OK for some reason, how about keeping a Scope<Expr> of all containing letstmts and using scope.contains(buffer_name) ? scope.get(buffer_name) : reinterpret(Handle(), 0)

// Allocate task list array
Expr tasks_list = Call::make(Handle(), Call::make_typed_struct, tasks_array_args, Call::PureIntrinsic);
Expr user_context = Call::make(type_of<void *>(), Call::get_user_context, {}, Call::PureIntrinsic);
Expr task_parent = Call::make(Handle(), Call::get_pointer_symbol_or_null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment applies about keeping a Scope<Expr> instead

@abadams
Copy link
Member

abadams commented Dec 3, 2021

What do you think about having a list of struct types inside the Module object instead of in the Stmt IR? A struct type would have a name and be either a primitive type or a vector of struct types. Then we'd have just the make_typed_struct and load_typed_struct_member instrinsics (or IR nodes) and they could refer to these global struct declarations by name.

The structs themselves would be passed around as Exprs of Handle() type. I think we should assume llvm will eventually remove all types from pointer values. That's the plan AFAIK, and the API reflects that. The loads and stores and GEPs are typed, but not the values themselves.

Finally, for function names as Handle()-valued Exprs, I think we should just use a Variable node with that name with type Handle(). These are symbols that are supposed to be in scope, so Variable is appropriate. At the start of codegenning a module we could add declarations for all the loweredfuncs in it and then add the corresponding names to the symbol table.

* Minor cleanup of parallel refactor intrinsics

- Renamed `load_struct_member` to `load_typed_struct_member` to make it more clear that it is intended for use only with the results of `make_typed_struct`.
- Split `declare_struct_type` into two intrinsics, `define_typed_struct` and `forward_declare_typed_struct`, removing the need for the underdocumented `mode` argument and hopefully making usage clearer
- Added / clarified comments for the intrinsics modified above

* Update comments

* Fix comments
} else if (op->is_intrinsic(Call::define_typed_struct)) {
// Emit a declaration like:
//
// struct {const int f_0, const char f_1, const int f_2} foo;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

struct foo {const int f_0; const char f_1; const int f_2;};

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my bad, will fix

wrapped_body = LetStmt::make(b.first,
Call::make(type_of<void *>(), Call::load_struct_member,
{closure_arg, struct_type, make_const(UInt(32), struct_index)},
Call::PureIntrinsic),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved?

@steven-johnson
Copy link
Contributor

What do you think about having a list of struct types inside the Module object instead of in the Stmt IR?

My first thought is that it would add new management challenges to the IR; currently our IR does a good job of being standalone at any point in the tree, but this would introduce a class of IR nodes that could only be correctly used inside a properly-configured Module.

@abadams
Copy link
Member

abadams commented Dec 6, 2021

It's not without precedent: A Buffer object, which is referred to by name, can only be used inside a correctly configured Module. Modules are like C translation units.

This is a struct type that has to be shared between multiple LoweredFunc objects. It exists in the same container that holds those LoweredFuncs, not within the LoweredFuncs themselves. By way of analogy, in C if you have a struct type shared by multiple functions, you put that struct type declaration outside of the functions.

@zvookin
Copy link
Member Author

zvookin commented Dec 7, 2021

I have no doubt there is a better design to add aggregate types to the IR, but I have trouble seeing how adding structs at the module level is any better than this current approach. E.g. at least this design prints out all the information in debug output. I also don't see a whole lot of cost in changing this later when we have a better design, thus it seems a good first step on the way to something else.

We kind of need this support to deliver certain things and it's been in code review since mid-August so if there are concrete objections to how it works I'd like to focus on those. Moving the struct types out a level could easily be done in a second pass, though if we're going to do that, I'd suggest actually adding scoping as well.

@abadams
Copy link
Member

abadams commented Dec 7, 2021

My concrete objections to the design are the declaration of types inside imperative Stmt IR, which doesn't mentally type check for me, and the use of StringImms to refer to things in some sort of notional scope, which has caused us considerable pain in the past because passes that track dependencies between bits of IR have to special-case the stringimms.

I want to add the minimum number of new intrinsics. I think we can do this with 2/3, not six.

@abadams
Copy link
Member

abadams commented Dec 7, 2021

If you don't want to put things in Module, then I woke up with some ideas that would meet my design constraints:

Make a StructType object, which is either a Type or a list of StructType objects, held behind an Intrusive Ptr. There's also an optional name field, which is empty for anonymous types.

Make new IR nodes MakeTypedStruct and LoadStructMember, both of which have StructType fields that give the type. MakeTypedStruct also has a vector of Expr. It is of type Handle(). LoadStructMember also takes a single int giving the index. It is of the type of that struct field.

Make a StructType object as above, and add it as an optionally-defined field to Type. There can exist Handle types with the StructType field defined.

Have the existing Intrinsic make_struct have the StructType field set on its Type. The args are just the members.
Add a new Intrinsic load_struct_member which takes something of type Handle, which must have the StructType field set on its Type, and an int that gives the index of the struct member.


With both of these options, you would discover the StructTypes while traversing the IR. In the backends you would add a declaration of them to the preamble as you discover them, if you haven't hit that StructType yet. We'd change the existing uses of the make_struct intrinsic accordingly (as a follow-up PR).

Both options avoid the StringImms and avoid an imperative piece of IR used to declare a type.

This is a direct adaptation of what #6481 does for the old LLVM-based code, and allows elimination of one use of `get_pointer_or_null()`. PR is meant to be merged into factor_parallel_codegen, not master.
@zvookin
Copy link
Member Author

zvookin commented Dec 8, 2021

My position is that this PR is a meaningful improvement in that it moves a significant component of codegen from the LLVM level to the IR level and allows that to be reused across backends. The intrinsics to me are scaffolding and I am in agreement that they are suboptimal, but they are also fine for now. The question is whether the best path forward is to merge the PR and then work on moving the intrinsics to e.g. new IR nodes or if we hold it up for another couple months to do that. The bar for introducing new IR nodes is much higher as it introduces code changes in a lot more places. Removing intrinsics is pretty easy in comparison to adding/modifying/removing an IR node.

E.g. the get_symbol_pointer_or_null thing was to mirror code that was in the LLVM backend and I didn't want to change the behavior because I needed to get the whole thing working and passing tests. (The behavior was also used in a couple other places originally, one for user_context and the other for task_parent. Both of those had other solutions, but the LLVM code really did do a check on the scoped values to implement them.) It turns out the main reason this intrinsic was there is probably dead code anyway and hey, maybe that could have been figured out earlier, but I have enough trouble with not doing things because I'm trying for perfection rather than getting it done. Nuking get_symbol_pointer_or_null is great, but it would have been fine to do that after landing the PR too. (And probably more conservative in that if the .buffer maybe/maybe not code is there for a reason, it would give a place to drop back to without backing out this entire PR.)

I am happy to open issues and add TODOs and even to commit to doing the work to get rid of the intrinsics in some timeframe. But really I do not think they are enough of a problem to create a maintenance burden. Even the LLVM opaque pointer change implies mostly just removing code. (At the cost of losing type checking at the LLVM IR layer.) At worst it requires adding a bit of type info to the scope store.

@abadams
Copy link
Member

abadams commented Dec 9, 2021

I ran out of time today, but I'm working up one of my StructType proposals. We can take a look at it and see if it's better or worse.

steven-johnson and others added 3 commits December 9, 2021 11:47
I think this is the right fix for this case; that said, I can't find a single instance in any of our test cases that actually triggers this.
* Rework some of parallel closure lowering to avoid some intrinsics

This version relies more heavily on the existing make_struct, and puts
function names in the Variable namespace as globals.

Co-authored-by: Steven Johnson <srj@google.com>
Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@steven-johnson
Copy link
Contributor

Ready to merge.

@steven-johnson steven-johnson merged commit 46d8ca8 into master Dec 14, 2021
@steven-johnson steven-johnson deleted the factor_parallel_codegen branch December 14, 2021 00:31
steven-johnson added a commit that referenced this pull request Dec 15, 2021
Handles need to be `auto *` for the previous PR to work properly. (This should be refactored more intelligently to reduce code reuse; this is just a quick-fix to unbreak.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants