Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

factor parallel codegen with fewer intrinsics #6487

Merged

Conversation

abadams
Copy link
Member

@abadams abadams commented Dec 9, 2021

This version relies more heavily on the existing make_struct, and puts function names in the Variable namespace as globals.

I ended up not doing a StructType, because it didn't seem to pay for itself compared to using a prototype, as the existing code did. It turns out in that approach it was still possible to remove many of the intrinsics.

This version relies more heavily on the existing make_struct, and puts
function names in the Variable namespace as globals.
Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far. Going to pull it locally and see how the output looks.

src/Closure.cpp Outdated Show resolved Hide resolved
src/Closure.cpp Show resolved Hide resolved
src/Closure.cpp Show resolved Hide resolved
src/CodeGen_C.cpp Outdated Show resolved Hide resolved
src/CodeGen_C.cpp Show resolved Hide resolved
src/CodeGen_C.cpp Outdated Show resolved Hide resolved
src/CodeGen_LLVM.cpp Outdated Show resolved Hide resolved
src/CodeGen_LLVM.cpp Show resolved Hide resolved
src/LowerParallelTasks.cpp Show resolved Hide resolved
@zvookin
Copy link
Member

zvookin commented Dec 9, 2021

I'm going to hand the rest of this over to Steven. He and I are talking as needed.

test/common/check_call_graphs.h Outdated Show resolved Hide resolved
if (op->is_producer) {
std::string old_producer = producer;
producer = op->name;
calls[producer]; // Make sure each producer is allocated a slot
// Group the callees of the 'produce' and 'update' together
op->body.accept(this);
auto new_stmt = mutate(op->body);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the nature of the changes here at all -- using a Mutator here to check correctness makes me a little nervous. What is the upside of this approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pattern from other tests. If you want to validate IR at the end of lowering, but before backend stuff starts, you can do it with a custom lowering pass. Those must be mutators, not visitors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the upside is that it happens before we split things off into closures, which removes the complexity of handling that from this test.

test/correctness/func_clone.cpp Outdated Show resolved Hide resolved
@steven-johnson
Copy link
Contributor

test_internal is broken because of C++ codegen differences

stream << get_indent() << print_type(t, AppendSpace) << const_flag << id << " = " << rhs << ";\n";
if (t.is_handle()) {
// Don't print void *, which might lose useful type information. just use auto.
stream << get_indent() << "auto *";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my suggestion about auto * was bad; I made a clone of this for testing and there are failures because we generated code of the form

pipeline_c.halide_generated.cpp:3720:17: error: unable to deduce ‘auto*’ from ‘0’
    auto *_275 = 0;

(see https://buildbot.halide-lang.org/master/#/builders/118/builds/541)

...that said, is just using auto here ok if the RHS is a plain 0? If the type is Handle then surely we want to ensure it's at least pointer sized (if not explicitly 64 bits)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could turn that 0 into 'nullptr'. I'll take a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an unused value, so it was actually fine, but it was a place where we were explicitly deciding to use 0 for a pointer, so it was easy to change to nullptr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we get

pipeline_c.halide_generated.cpp:3720:17: error: unable to deduce ‘auto*’ from ‘nullptr’
    auto *_275 = nullptr;

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ok to merge.

@abadams abadams merged commit c70badd into factor_parallel_codegen Dec 13, 2021
steven-johnson added a commit that referenced this pull request Dec 14, 2021
…R lowering pass. (#6195)

* First cut at factoring parallel task compilation, including closure
generating and calling, into a normal IR to IR lowering pass.

Includes adding struct handling intrinsics to LLVM and C++ backends.

Still a work in progress.

* Fix formating that got munged by emacs somehow.

* Checkpoint progress.

* Small fixes.

* Checkpoint progress.

* Checkpoint preogress.

* Checkpoint progress.

* Checkpoint progress. Debugging code will be removed.

* Try a fix for make_typed_struct in C++ codegen.

* Another attempt to fix C++ codegen.

* Another C codegen fix.

* Checkpoint progress.

* Use make_typed_struct rather than make_struct to construct
closure. Ensure all types are carried through exactly the same to both
make_struct_type and make_typed_struct.

* Checkpoint.

* Uniqueify closure names because LLVM was doing that to function names.

* Small formatting cleanups.

Fixes to call graph checker. Disable tests related to this while
Andrew and I figure out how to get it to work across closures.

* Get generated C++ to compile via a combination of fixing types and
bludgeoning types into submission via subterfuge.

* Typo fix to a typo fix.

* Restore inadvertently deleted code.

* Rename make_struct_type to declare_struct_type.

* Add new file to CMake.

* Add fixes for Hexagon offload and any passes that might add additional
LoweredFunctions in the future.

* Add comment with a bit of info for the future..

* Typo fix.

* Don't duplicate the closure call to test the error return.

Don't declare halide_buffer_t type if there are no buffers in closure.

* Use _ucon in C++ code to get rid of constness casting ugliness.

* Change resolve_function_name intrinsic to use a Call node to designate
the function. This makes generating the C++ declaration in the C++
backend trivial.

Few more changes to type handling and naming.

* Small C++ backend output formating change.

Don't generate For loops with no variable.

Update internal test for C++ output.

* Add halide_semaphore_acquire_t as a well known type for use inside compiler.

* Add handling for halide_semaphore_t allocation.

Formating fixes.

* Fix type for halide_semaphore_t.

* Reapply C++ backend formatting fix.

* Add support for calling legacy halide_do_par_for runtime routine in
cases where it is valid.

* Formatting fixes.

* Format and tidy fixes.

* Attempt to pass formatting check.

* Fix last set of test failures.

* Formatting whitespace fixes.

* Update comments.

* Attempt to fix pointer cast error with some versions of LLVM.

* Another attempt at fixing bool compatibility casting.

* Another iteration.

* Remove likely useless extern argument check logic.

* Add hacky fix for losing global variables.

* Comment typo fixes.

* Remove no-longer-used Closure code from Codegen_Internal

* Remove unused MayBlock visitor class

* clang-tidy

* Attempt to fix parallel offloads for HVX

* Update parallel_nested_1.cpp

* Augment Closure debugging

* Add some std::move usage

* Fix hvx lock/unlock semantics for PR #6457 (#6462)

Fix qurt_hvx_lock issues

* Sort IntrinsicOp and corresponding names

* Remove unused `is_const_pointer()` function

* Minor hygiene in LowerParallelTasks

- normalize local functions to snake_case
- put all local functions & classes in anon namespace
- move MinThreads visitor to file scope to reduce nestedness of code

* use Closure::include

* Switch to PureIntrinsics per review feedback.

* Minor cleanup of parallel refactor intrinsics (#6465)

* Minor cleanup of parallel refactor intrinsics

- Renamed `load_struct_member` to `load_typed_struct_member` to make it more clear that it is intended for use only with the results of `make_typed_struct`.
- Split `declare_struct_type` into two intrinsics, `define_typed_struct` and `forward_declare_typed_struct`, removing the need for the underdocumented `mode` argument and hopefully making usage clearer
- Added / clarified comments for the intrinsics modified above

* Update comments

* Fix comments

* Update CodeGen_C.cpp

* Remove 'foo.buffer' from Closure entirely

This is a direct adaptation of what #6481 does for the old LLVM-based code, and allows elimination of one use of `get_pointer_or_null()`. PR is meant to be merged into factor_parallel_codegen, not master.

* Update LowerParallelTasks.cpp

* Keep track of task_parent inside LowerParallelTasks; remove no-longer-needed get_pointer_or_symbol() intrinsic (#6486)

* Fix potential issue with additional LoweredFuncs (#6490)

I think this is the right fix for this case; that said, I can't find a single instance in any of our test cases that actually triggers this.

* factor parallel codegen with fewer intrinsics (#6487)

* Rework some of parallel closure lowering to avoid some intrinsics

This version relies more heavily on the existing make_struct, and puts
function names in the Variable namespace as globals.

Co-authored-by: Steven Johnson <srj@google.com>

Co-authored-by: dsharletg <dsharlet@google.com>
Co-authored-by: Steven Johnson <srj@google.com>
Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com>
@steven-johnson steven-johnson deleted the abadams/factor_parallel_codegen_fewer_intrinsics branch December 14, 2021 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants