Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

Merged
merged 85 commits into from
Dec 14, 2021
Merged
Show file tree
Hide file tree
Changes from 77 commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
0b7bd25
First cut at factoring parallel task compilation, including closure
Jul 22, 2021
e3a14b0
Fix formating that got munged by emacs somehow.
Jul 23, 2021
e06a2a8
Merge branch 'master' into factor_parallel_codegen
Jul 23, 2021
7e90c97
Checkpoint progress.
Jul 30, 2021
c9ab059
Small fixes.
Jul 31, 2021
a6d9f1d
Checkpoint progress.
Aug 1, 2021
a2a92a4
Checkpoint preogress.
Aug 2, 2021
8375961
Checkpoint progress.
Aug 2, 2021
fbb05ab
Checkpoint progress. Debugging code will be removed.
Aug 3, 2021
f8de53e
Try a fix for make_typed_struct in C++ codegen.
Aug 3, 2021
c5d902e
Another attempt to fix C++ codegen.
Aug 3, 2021
b1f8c6c
Another C codegen fix.
Aug 4, 2021
5eef291
Checkpoint progress.
Aug 4, 2021
3391b6d
Merge branch 'master' into factor_parallel_codegen
Aug 4, 2021
ade53a2
Use make_typed_struct rather than make_struct to construct
Aug 6, 2021
980c447
Checkpoint.
Aug 11, 2021
7071d40
Uniqueify closure names because LLVM was doing that to function names.
Aug 11, 2021
9b5c399
Small formatting cleanups.
Aug 11, 2021
ffb2887
Get generated C++ to compile via a combination of fixing types and
Aug 12, 2021
91d7130
Typo fix to a typo fix.
Aug 12, 2021
6785903
Merge branch 'master' into factor_parallel_codegen
Aug 12, 2021
c711bb1
Restore inadvertently deleted code.
Aug 12, 2021
d6ac3e9
Rename make_struct_type to declare_struct_type.
Aug 12, 2021
f157165
Add new file to CMake.
Aug 12, 2021
3c33517
Add fixes for Hexagon offload and any passes that might add additional
Aug 13, 2021
c4f517b
Add comment with a bit of info for the future..
Aug 13, 2021
69db2ad
Typo fix.
Aug 14, 2021
b5fd7f9
Don't duplicate the closure call to test the error return.
Aug 14, 2021
2e3660d
Use _ucon in C++ code to get rid of constness casting ugliness.
Aug 16, 2021
b8272fc
Merge branch 'master' into factor_parallel_codegen
Aug 17, 2021
4f345e0
Change resolve_function_name intrinsic to use a Call node to designate
Aug 18, 2021
ea9fa9a
Small C++ backend output formating change.
Aug 19, 2021
9cdb347
Add halide_semaphore_acquire_t as a well known type for use inside co…
Aug 19, 2021
147f381
Add handling for halide_semaphore_t allocation.
Aug 19, 2021
a71ecf1
Fix type for halide_semaphore_t.
Aug 19, 2021
d68ca8c
Reapply C++ backend formatting fix.
Aug 19, 2021
f00a4e5
Merge branch 'master' into factor_parallel_codegen
Aug 26, 2021
c0526eb
Add support for calling legacy halide_do_par_for runtime routine in
Aug 28, 2021
739a358
Formatting fixes.
Aug 28, 2021
dd0f8ed
Format and tidy fixes.
Aug 28, 2021
1fe603b
Attempt to pass formatting check.
Aug 28, 2021
e8f87ac
Merge branch 'master' into factor_parallel_codegen
Aug 30, 2021
71e5612
Merge branch 'master' into factor_parallel_codegen
Sep 14, 2021
2f24967
Fix last set of test failures.
Sep 17, 2021
430cab2
Formatting whitespace fixes.
Sep 17, 2021
d37a16c
Update comments.
Sep 17, 2021
6cefab2
Merge branch 'master' into factor_parallel_codegen
Sep 23, 2021
9a3d926
Attempt to fix pointer cast error with some versions of LLVM.
Sep 28, 2021
1aac285
Another attempt at fixing bool compatibility casting.
Sep 29, 2021
e8e296b
Another iteration.
Sep 29, 2021
2da926b
Merge branch 'master' into factor_parallel_codegen
Sep 29, 2021
3946ccd
Remove likely useless extern argument check logic.
Oct 1, 2021
687b71b
Merge branch 'master' into factor_parallel_codegen
Oct 2, 2021
c87399c
Merge branch 'master' into factor_parallel_codegen
Nov 5, 2021
00a0715
Add hacky fix for losing global variables.
dsharletg Nov 15, 2021
5d93f1e
Comment typo fixes.
Nov 19, 2021
f49f800
Merge branch 'master' into factor_parallel_codegen
Nov 19, 2021
5733306
Merge branch 'master' into factor_parallel_codegen
steven-johnson Nov 22, 2021
114d209
Merge branch 'master' into factor_parallel_codegen
steven-johnson Nov 23, 2021
df247f9
Remove no-longer-used Closure code from Codegen_Internal
steven-johnson Nov 23, 2021
b9ac7d6
Remove unused MayBlock visitor class
steven-johnson Nov 23, 2021
6528ba6
clang-tidy
steven-johnson Nov 23, 2021
2fd4c9f
Attempt to fix parallel offloads for HVX
steven-johnson Nov 23, 2021
e71bb81
Merge branch 'factor_parallel_codegen' of https://github.com/halide/H…
steven-johnson Nov 23, 2021
e64651b
Merge branch 'master' into factor_parallel_codegen
steven-johnson Nov 29, 2021
7d2c2c6
Update parallel_nested_1.cpp
steven-johnson Nov 29, 2021
e02571b
Augment Closure debugging
steven-johnson Nov 30, 2021
7eb9f5f
Add some std::move usage
steven-johnson Nov 30, 2021
f02b1e9
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 1, 2021
0faca07
Fix hvx lock/unlock semantics for PR #6457 (#6462)
steven-johnson Dec 2, 2021
d180598
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 2, 2021
9af64ae
Sort IntrinsicOp and corresponding names
steven-johnson Dec 2, 2021
25b7b77
Remove unused `is_const_pointer()` function
steven-johnson Dec 2, 2021
387c58f
Minor hygiene in LowerParallelTasks
steven-johnson Dec 2, 2021
24a6eb2
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 2, 2021
a1d267e
use Closure::include
steven-johnson Dec 2, 2021
c21e701
Switch to PureIntrinsics per review feedback.
Dec 3, 2021
b6ba514
Minor cleanup of parallel refactor intrinsics (#6465)
steven-johnson Dec 6, 2021
2923246
Update CodeGen_C.cpp
steven-johnson Dec 6, 2021
5764d12
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 8, 2021
64db40f
Remove 'foo.buffer' from Closure entirely
steven-johnson Dec 8, 2021
bdd7857
Update LowerParallelTasks.cpp
steven-johnson Dec 8, 2021
5cf15ea
Keep track of task_parent inside LowerParallelTasks; remove no-longer…
steven-johnson Dec 9, 2021
57412df
Fix potential issue with additional LoweredFuncs (#6490)
steven-johnson Dec 9, 2021
c70badd
factor parallel codegen with fewer intrinsics (#6487)
abadams Dec 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,7 @@ SOURCE_FILES = \
LLVM_Runtime_Linker.cpp \
LoopCarry.cpp \
Lower.cpp \
LowerParallelTasks.cpp \
LowerWarpShuffles.cpp \
MatlabWrapper.cpp \
Memoization.cpp \
Expand Down Expand Up @@ -669,6 +670,7 @@ HEADER_FILES = \
LLVM_Runtime_Linker.h \
LoopCarry.h \
Lower.h \
LowerParallelTasks.h \
LowerWarpShuffles.h \
MainPage.h \
MatlabWrapper.h \
Expand Down
2 changes: 1 addition & 1 deletion src/AsyncProducers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ class ForkAsyncProducers : public IRMutator {
vector<Expr> sema_vars;
for (int i = 0; i < consumes.count; i++) {
sema_names.push_back(op->name + ".semaphore_" + std::to_string(i));
sema_vars.push_back(Variable::make(Handle(), sema_names.back()));
sema_vars.push_back(Variable::make(type_of<halide_semaphore_t *>(), sema_names.back()));
}

Stmt producer = GenerateProducerBody(op->name, sema_vars, cloned_acquires).mutate(body);
Expand Down
2 changes: 2 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ set(HEADER_FILES
LLVM_Runtime_Linker.h
LoopCarry.h
Lower.h
LowerParallelTasks.h
LowerWarpShuffles.h
MainPage.h
MatlabWrapper.h
Expand Down Expand Up @@ -258,6 +259,7 @@ set(SOURCE_FILES
LLVM_Runtime_Linker.cpp
LoopCarry.cpp
Lower.cpp
LowerParallelTasks.cpp
LowerWarpShuffles.cpp
MatlabWrapper.cpp
Memoization.cpp
Expand Down
19 changes: 15 additions & 4 deletions src/Closure.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ namespace Internal {

using std::string;

namespace {
constexpr int DBG = 3;
} // namespace

void Closure::include(const Stmt &s, const string &loop_variable) {
if (!loop_variable.empty()) {
ignore.push(loop_variable);
Expand Down Expand Up @@ -38,7 +42,7 @@ void Closure::visit(const For *op) {
void Closure::found_buffer_ref(const string &name, Type type,
bool read, bool written, const Halide::Buffer<> &image) {
if (!ignore.contains(name)) {
debug(3) << "Adding buffer " << name << " to closure\n";
debug(DBG) << "Adding buffer " << name << " to closure:\n";
Buffer &ref = buffers[name];
ref.type = type.element_of(); // TODO: Validate type is the same as existing refs?
ref.read = ref.read || read;
Expand All @@ -49,8 +53,15 @@ void Closure::found_buffer_ref(const string &name, Type type,
ref.size = image.size_in_bytes();
ref.dimensions = image.dimensions();
}
debug(DBG) << " "
<< " t=" << ref.type
<< " d=" << (int)ref.dimensions
<< " r=" << ref.read
<< " w=" << ref.write
<< " mt=" << (int)ref.memory_type
<< " sz=" << ref.size << "\n";
} else {
debug(3) << "Not adding " << name << " to closure\n";
debug(DBG) << "Not adding buffer " << name << " to closure\n";
}
}

Expand Down Expand Up @@ -81,9 +92,9 @@ void Closure::visit(const Allocate *op) {

void Closure::visit(const Variable *op) {
if (ignore.contains(op->name)) {
debug(3) << "Not adding " << op->name << " to closure\n";
debug(DBG) << "Not adding var " << op->name << " to closure\n";
} else {
debug(3) << "Adding " << op->name << " to closure\n";
debug(DBG) << "Adding var " << op->name << " to closure\n";
vars[op->name] = op->type;
}
}
Expand Down
147 changes: 138 additions & 9 deletions src/CodeGen_C.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1820,7 +1820,7 @@ void CodeGen_C::compile(const LoweredFunc &f, const std::map<std::string, std::s
set_name_mangling_mode(name_mangling);

std::vector<std::string> namespaces;
std::string simple_name = extract_namespaces(f.name, namespaces);
std::string simple_name = c_print_name(extract_namespaces(f.name, namespaces), false);
if (!is_c_plus_plus_interface()) {
user_assert(namespaces.empty()) << "Namespace qualifiers not allowed on function name if not compiling with Target::CPlusPlusNameMangling.\n";
}
Expand Down Expand Up @@ -2369,12 +2369,19 @@ void CodeGen_C::visit(const Call *op) {
} else if (op->is_intrinsic(Call::alloca)) {
internal_assert(op->args.size() == 1);
internal_assert(op->type.is_handle());
const int64_t *sz = as_const_int(op->args[0]);
if (op->type == type_of<struct halide_buffer_t *>() &&
Call::as_intrinsic(op->args[0], {Call::size_of_halide_buffer_t})) {
stream << get_indent();
string buf_name = unique_name('b');
stream << "halide_buffer_t " << buf_name << ";\n";
rhs << "&" << buf_name;
} else if (op->type == type_of<struct halide_semaphore_t *>() &&
sz && *sz == 16) {
stream << get_indent();
string semaphore_name = unique_name("sema");
stream << "halide_semaphore_t " << semaphore_name << ";\n";
rhs << "&" << semaphore_name;
} else {
// Make a stack of uint64_ts
string size = print_expr(simplify((op->args[0] + 7) / 8));
Expand Down Expand Up @@ -2460,6 +2467,110 @@ void CodeGen_C::visit(const Call *op) {
}
rhs << "(&" << struct_name << ")";
}
} else if (op->is_intrinsic(Call::declare_struct_type)) {
// Emit a declaration like:
// struct {const int f_0, const char f_1, const int f_2} foo;

// Declares a struct type. Returns a null pointer of the new type.
internal_assert(op->args.size() >= 2);

const StringImm *str_imm = op->args[0].as<StringImm>();
internal_assert(str_imm != nullptr);
std::string name = c_print_name(str_imm->value, false);

const int64_t *mode = as_const_int(op->args[1]);
internal_assert(mode != nullptr);

if (*mode == 1 || !starts_with(name, "halide_")) {
steven-johnson marked this conversation as resolved.
Show resolved Hide resolved
stream << get_indent() << "struct " << name;
if (*mode == 1) {
stream << " {\n";
// List the types.
indent++;
for (size_t i = 2; i < op->args.size(); i++) {
stream << get_indent() << print_type(op->args[i].type(), AppendSpace) << "f_" << i - 2 << ";\n";
}
indent--;
stream << get_indent() << "};\n";
} else {
stream << ";\n";
}
}
rhs << "((" << name << " *)nullptr)";
} else if (op->is_intrinsic(Call::make_typed_struct)) {
steven-johnson marked this conversation as resolved.
Show resolved Hide resolved
internal_assert(op->args.size() >= 2);

string type_carrier = print_expr(op->args[0]);
const int64_t *count_ptr = as_const_int(op->args[1]);
internal_assert(count_ptr != nullptr);
int64_t count = *count_ptr;

// Get the args
vector<string> values;
for (size_t i = 2; i < op->args.size(); i++) {
values.push_back(print_expr(op->args[i]));
}
string struct_name = unique_name('s');
string array_specifier;
if (count > 0) {
array_specifier = "[]";
}
stream << get_indent() << "std::remove_pointer<decltype(" << type_carrier << ")>::type " << struct_name << array_specifier << " = {\n";
// List the values.
indent++;
int item = 0;
size_t initializer_count = values.size() / count;
internal_assert((values.size() % count) == 0);
do {
if (count > 0) {
stream << get_indent() << "{\n";
indent++;
}
for (size_t i = 0; i < initializer_count; i++) {
stream << get_indent() << values[i + item * initializer_count];
if (i < op->args.size() - 1) {
stream << ",";
}
stream << "\n";
}
if (count > 0) {
indent--;
stream << get_indent() << "},\n";
}
item += 1;
} while (item < count);
indent--;
stream << get_indent() << "};\n";
// Return a pointer to it of the appropriate type

// TODO: This is dubious type-punning. We really need to
// find a better way to do this. We dodge the problem for
// the specific case of buffer shapes in the case above.
if (op->type.handle_type) {
rhs << "(" << print_type(op->type) << ")";
}
rhs << "(&" << struct_name << ((count > 0) ? "[0])" : ")");
} else if (op->is_intrinsic(Call::load_struct_member)) {
internal_assert(op->args.size() == 3);
const uint64_t *index = as_const_uint(op->args[2]);
internal_assert(index != nullptr);
std::string struct_base = print_expr(op->args[0]);
std::string type_carrier = print_expr(op->args[1]);
rhs << "((decltype(" << type_carrier << "))" << struct_base << ")->"
<< "f_" << *index;
} else if (op->is_intrinsic(Call::resolve_function_name)) {
internal_assert(op->args.size() == 1);
const Call *decl_call = op->args[0].as<Call>();
internal_assert(decl_call != nullptr);
std::string c_name = c_print_name(decl_call->name, false);
rhs << "(&" << c_name << ")";
} else if (op->is_intrinsic(Call::get_user_context)) {
internal_assert(op->args.empty());
rhs << "_ucon";
} else if (op->is_intrinsic(Call::get_pointer_symbol_or_null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLVM implementation of this symbol attempts a symbol lookup, but this just returns null unconditionally, which is mighty puzzling. Is it a bug, or do I just misunderstand the usage? (Documentation somewhere would be helpful)

internal_assert(op->args.size() == 2);
// TODO(zalman|abadams): Figure out how to get rid of the maybe foo.buffer exists, maybe it doesn't thing in closures.
rhs << "(nullptr)";
} else if (op->is_intrinsic(Call::stringify)) {
// Rewrite to an snprintf
vector<string> printf_args;
Expand Down Expand Up @@ -2490,7 +2601,6 @@ void CodeGen_C::visit(const Call *op) {
stream << get_indent() << "char " << buf_name << "[1024];\n";
stream << get_indent() << "snprintf(" << buf_name << ", 1024, \"" << format_string << "\", " << with_commas(printf_args) << ");\n";
rhs << buf_name;

} else if (op->is_intrinsic(Call::register_destructor)) {
internal_assert(op->args.size() == 2);
const StringImm *fn = op->args[0].as<StringImm>();
Expand Down Expand Up @@ -2560,6 +2670,21 @@ void CodeGen_C::visit(const Call *op) {
stream << get_indent() << rhs.str() << ";\n";
// Make an innocuous assignment value for our caller (probably an Evaluate node) to ignore.
print_assignment(op->type, "0");
} else if (op->is_intrinsic(Call::declare_struct_type) ||
op->is_intrinsic(Call::make_typed_struct) ||
op->is_intrinsic(Call::load_struct_member) ||
op->is_intrinsic(Call::resolve_function_name)) {
// print_assigment will get the type info wrong for this case.
std::string rhs_str = rhs.str();
auto cached = cache.find(rhs_str);
if (cached == cache.end()) {
id = unique_name('_');
stream << get_indent() << "auto " << id << " = " << rhs_str << ";\n";
} else {
id = cached->second;
}
// Avoid unused variable warnings.
stream << get_indent() << "halide_unused(" << id << ");\n";
} else {
print_assignment(op->type, rhs.str());
}
Expand Down Expand Up @@ -2710,9 +2835,10 @@ void CodeGen_C::visit(const Let *op) {
if (op->value.type().is_handle()) {
// The body might contain a Load that references this directly
// by name, so we can't rewrite the name.
stream << get_indent() << print_type(op->value.type())
<< " " << print_name(op->name)
<< " = " << id_value << ";\n";
std::string name = print_name(op->name);
stream << get_indent() << "auto "
<< name << " = " << id_value << ";\n";
stream << get_indent() << "halide_unused(" << name << ");\n";
} else {
Expr new_var = Variable::make(op->value.type(), id_value);
body = substitute(op->name, new_var, body);
Expand Down Expand Up @@ -2800,12 +2926,14 @@ void CodeGen_C::visit(const VectorReduce *op) {
void CodeGen_C::visit(const LetStmt *op) {
string id_value = print_expr(op->value);
Stmt body = op->body;

if (op->value.type().is_handle()) {
// The body might contain a Load or Store that references this
// directly by name, so we can't rewrite the name.
stream << get_indent() << print_type(op->value.type())
<< " " << print_name(op->name)
<< " = " << id_value << ";\n";
std::string name = print_name(op->name);
stream << get_indent() << "auto "
<< name << " = " << id_value << ";\n";
stream << get_indent() << "halide_unused(" << name << ");\n";
} else {
Expr new_var = Variable::make(op->value.type(), id_value);
body = substitute(op->name, new_var, body);
Expand Down Expand Up @@ -3222,7 +3350,8 @@ HALIDE_FUNCTION_ATTRS
int test1(struct halide_buffer_t *_buf_buffer, float _alpha, int32_t _beta, void const *__user_context) {
void * const _ucon = const_cast<void *>(__user_context);
void *_0 = _halide_buffer_get_host(_buf_buffer);
void * _buf = _0;
auto _buf = _0;
halide_unused(_buf);
{
int64_t _1 = 43;
int64_t _2 = _1 * _beta;
Expand Down
24 changes: 24 additions & 0 deletions src/CodeGen_Hexagon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,30 @@ class InjectHVXLocks : public IRMutator {
}
Expr visit(const Call *op) override {
uses_hvx = uses_hvx || op->type.is_vector();

if (op->name == "halide_do_par_for") {
// If we see a call to halide_do_par_for() at this point, it should mean that
// this statement was produced via HexagonOffload calling lower_parallel_tasks()
// explicitly; in this case, we won't see any parallel For statements, since they've
// all been transformed into closures already. To mirror the pattern above,
// we need to wrap the halide_do_par_for() call with an unlock/lock pair, but
// that's hard to do in Halide IR (we'd need to produce a Stmt to enforce the ordering,
// and the resulting Stmt can't easily be substituted for the Expr here). Rather than
// make fragile assumptions about the structure of the IR produced by lower_parallel_tasks(),
// we'll use a trick: we'll define a WEAK_INLINE function, _halide_hexagon_do_par_for,
// which simply encapsulates the unlock()/do_par_for()/lock() sequences, and swap out
// the call here. Since it is inlined, and since uses_hvx_var gets substituted at the end,
// we end up with LLVM IR that properly includes (or omits) the unlock/lock pair depending
// on the final value of uses_hvx_var in this scope.

internal_assert(op->call_type == Call::Extern);
internal_assert(op->args.size() == 4);

std::vector<Expr> args = op->args;
args.push_back(cast<int>(uses_hvx_var));

return Call::make(Int(32), "_halide_hexagon_do_par_for", args, Call::Extern);
}
return op;
}

Expand Down
Loading