Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. #6195

Merged
merged 85 commits into from
Dec 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
0b7bd25
First cut at factoring parallel task compilation, including closure
Jul 22, 2021
e3a14b0
Fix formating that got munged by emacs somehow.
Jul 23, 2021
e06a2a8
Merge branch 'master' into factor_parallel_codegen
Jul 23, 2021
7e90c97
Checkpoint progress.
Jul 30, 2021
c9ab059
Small fixes.
Jul 31, 2021
a6d9f1d
Checkpoint progress.
Aug 1, 2021
a2a92a4
Checkpoint preogress.
Aug 2, 2021
8375961
Checkpoint progress.
Aug 2, 2021
fbb05ab
Checkpoint progress. Debugging code will be removed.
Aug 3, 2021
f8de53e
Try a fix for make_typed_struct in C++ codegen.
Aug 3, 2021
c5d902e
Another attempt to fix C++ codegen.
Aug 3, 2021
b1f8c6c
Another C codegen fix.
Aug 4, 2021
5eef291
Checkpoint progress.
Aug 4, 2021
3391b6d
Merge branch 'master' into factor_parallel_codegen
Aug 4, 2021
ade53a2
Use make_typed_struct rather than make_struct to construct
Aug 6, 2021
980c447
Checkpoint.
Aug 11, 2021
7071d40
Uniqueify closure names because LLVM was doing that to function names.
Aug 11, 2021
9b5c399
Small formatting cleanups.
Aug 11, 2021
ffb2887
Get generated C++ to compile via a combination of fixing types and
Aug 12, 2021
91d7130
Typo fix to a typo fix.
Aug 12, 2021
6785903
Merge branch 'master' into factor_parallel_codegen
Aug 12, 2021
c711bb1
Restore inadvertently deleted code.
Aug 12, 2021
d6ac3e9
Rename make_struct_type to declare_struct_type.
Aug 12, 2021
f157165
Add new file to CMake.
Aug 12, 2021
3c33517
Add fixes for Hexagon offload and any passes that might add additional
Aug 13, 2021
c4f517b
Add comment with a bit of info for the future..
Aug 13, 2021
69db2ad
Typo fix.
Aug 14, 2021
b5fd7f9
Don't duplicate the closure call to test the error return.
Aug 14, 2021
2e3660d
Use _ucon in C++ code to get rid of constness casting ugliness.
Aug 16, 2021
b8272fc
Merge branch 'master' into factor_parallel_codegen
Aug 17, 2021
4f345e0
Change resolve_function_name intrinsic to use a Call node to designate
Aug 18, 2021
ea9fa9a
Small C++ backend output formating change.
Aug 19, 2021
9cdb347
Add halide_semaphore_acquire_t as a well known type for use inside co…
Aug 19, 2021
147f381
Add handling for halide_semaphore_t allocation.
Aug 19, 2021
a71ecf1
Fix type for halide_semaphore_t.
Aug 19, 2021
d68ca8c
Reapply C++ backend formatting fix.
Aug 19, 2021
f00a4e5
Merge branch 'master' into factor_parallel_codegen
Aug 26, 2021
c0526eb
Add support for calling legacy halide_do_par_for runtime routine in
Aug 28, 2021
739a358
Formatting fixes.
Aug 28, 2021
dd0f8ed
Format and tidy fixes.
Aug 28, 2021
1fe603b
Attempt to pass formatting check.
Aug 28, 2021
e8f87ac
Merge branch 'master' into factor_parallel_codegen
Aug 30, 2021
71e5612
Merge branch 'master' into factor_parallel_codegen
Sep 14, 2021
2f24967
Fix last set of test failures.
Sep 17, 2021
430cab2
Formatting whitespace fixes.
Sep 17, 2021
d37a16c
Update comments.
Sep 17, 2021
6cefab2
Merge branch 'master' into factor_parallel_codegen
Sep 23, 2021
9a3d926
Attempt to fix pointer cast error with some versions of LLVM.
Sep 28, 2021
1aac285
Another attempt at fixing bool compatibility casting.
Sep 29, 2021
e8e296b
Another iteration.
Sep 29, 2021
2da926b
Merge branch 'master' into factor_parallel_codegen
Sep 29, 2021
3946ccd
Remove likely useless extern argument check logic.
Oct 1, 2021
687b71b
Merge branch 'master' into factor_parallel_codegen
Oct 2, 2021
c87399c
Merge branch 'master' into factor_parallel_codegen
Nov 5, 2021
00a0715
Add hacky fix for losing global variables.
dsharletg Nov 15, 2021
5d93f1e
Comment typo fixes.
Nov 19, 2021
f49f800
Merge branch 'master' into factor_parallel_codegen
Nov 19, 2021
5733306
Merge branch 'master' into factor_parallel_codegen
steven-johnson Nov 22, 2021
114d209
Merge branch 'master' into factor_parallel_codegen
steven-johnson Nov 23, 2021
df247f9
Remove no-longer-used Closure code from Codegen_Internal
steven-johnson Nov 23, 2021
b9ac7d6
Remove unused MayBlock visitor class
steven-johnson Nov 23, 2021
6528ba6
clang-tidy
steven-johnson Nov 23, 2021
2fd4c9f
Attempt to fix parallel offloads for HVX
steven-johnson Nov 23, 2021
e71bb81
Merge branch 'factor_parallel_codegen' of https://github.com/halide/H…
steven-johnson Nov 23, 2021
e64651b
Merge branch 'master' into factor_parallel_codegen
steven-johnson Nov 29, 2021
7d2c2c6
Update parallel_nested_1.cpp
steven-johnson Nov 29, 2021
e02571b
Augment Closure debugging
steven-johnson Nov 30, 2021
7eb9f5f
Add some std::move usage
steven-johnson Nov 30, 2021
f02b1e9
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 1, 2021
0faca07
Fix hvx lock/unlock semantics for PR #6457 (#6462)
steven-johnson Dec 2, 2021
d180598
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 2, 2021
9af64ae
Sort IntrinsicOp and corresponding names
steven-johnson Dec 2, 2021
25b7b77
Remove unused `is_const_pointer()` function
steven-johnson Dec 2, 2021
387c58f
Minor hygiene in LowerParallelTasks
steven-johnson Dec 2, 2021
24a6eb2
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 2, 2021
a1d267e
use Closure::include
steven-johnson Dec 2, 2021
c21e701
Switch to PureIntrinsics per review feedback.
Dec 3, 2021
b6ba514
Minor cleanup of parallel refactor intrinsics (#6465)
steven-johnson Dec 6, 2021
2923246
Update CodeGen_C.cpp
steven-johnson Dec 6, 2021
5764d12
Merge branch 'master' into factor_parallel_codegen
steven-johnson Dec 8, 2021
64db40f
Remove 'foo.buffer' from Closure entirely
steven-johnson Dec 8, 2021
bdd7857
Update LowerParallelTasks.cpp
steven-johnson Dec 8, 2021
5cf15ea
Keep track of task_parent inside LowerParallelTasks; remove no-longer…
steven-johnson Dec 9, 2021
57412df
Fix potential issue with additional LoweredFuncs (#6490)
steven-johnson Dec 9, 2021
c70badd
factor parallel codegen with fewer intrinsics (#6487)
abadams Dec 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,7 @@ SOURCE_FILES = \
LLVM_Runtime_Linker.cpp \
LoopCarry.cpp \
Lower.cpp \
LowerParallelTasks.cpp \
LowerWarpShuffles.cpp \
MatlabWrapper.cpp \
Memoization.cpp \
Expand Down Expand Up @@ -669,6 +670,7 @@ HEADER_FILES = \
LLVM_Runtime_Linker.h \
LoopCarry.h \
Lower.h \
LowerParallelTasks.h \
LowerWarpShuffles.h \
MainPage.h \
MatlabWrapper.h \
Expand Down
2 changes: 1 addition & 1 deletion src/AsyncProducers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -333,7 +333,7 @@ class ForkAsyncProducers : public IRMutator {
vector<Expr> sema_vars;
for (int i = 0; i < consumes.count; i++) {
sema_names.push_back(op->name + ".semaphore_" + std::to_string(i));
sema_vars.push_back(Variable::make(Handle(), sema_names.back()));
sema_vars.push_back(Variable::make(type_of<halide_semaphore_t *>(), sema_names.back()));
}

Stmt producer = GenerateProducerBody(op->name, sema_vars, cloned_acquires).mutate(body);
Expand Down
2 changes: 2 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ set(HEADER_FILES
LLVM_Runtime_Linker.h
LoopCarry.h
Lower.h
LowerParallelTasks.h
LowerWarpShuffles.h
MainPage.h
MatlabWrapper.h
Expand Down Expand Up @@ -258,6 +259,7 @@ set(SOURCE_FILES
LLVM_Runtime_Linker.cpp
LoopCarry.cpp
Lower.cpp
LowerParallelTasks.cpp
LowerWarpShuffles.cpp
MatlabWrapper.cpp
Memoization.cpp
Expand Down
88 changes: 84 additions & 4 deletions src/Closure.cpp
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
#include "Closure.h"
#include "Debug.h"
#include "ExprUsesVar.h"
#include "IRMutator.h"
#include "IROperator.h"

namespace Halide {
namespace Internal {

using std::string;

namespace {
constexpr int DBG = 3;
} // namespace

void Closure::include(const Stmt &s, const string &loop_variable) {
if (!loop_variable.empty()) {
ignore.push(loop_variable);
Expand Down Expand Up @@ -38,7 +45,7 @@ void Closure::visit(const For *op) {
void Closure::found_buffer_ref(const string &name, Type type,
bool read, bool written, const Halide::Buffer<> &image) {
if (!ignore.contains(name)) {
debug(3) << "Adding buffer " << name << " to closure\n";
debug(DBG) << "Adding buffer " << name << " to closure:\n";
Buffer &ref = buffers[name];
ref.type = type.element_of(); // TODO: Validate type is the same as existing refs?
ref.read = ref.read || read;
Expand All @@ -49,8 +56,15 @@ void Closure::found_buffer_ref(const string &name, Type type,
ref.size = image.size_in_bytes();
ref.dimensions = image.dimensions();
}
debug(DBG) << " "
<< " t=" << ref.type
<< " d=" << (int)ref.dimensions
<< " r=" << ref.read
<< " w=" << ref.write
<< " mt=" << (int)ref.memory_type
<< " sz=" << ref.size << "\n";
} else {
debug(3) << "Not adding " << name << " to closure\n";
debug(DBG) << "Not adding buffer " << name << " to closure\n";
}
}

Expand Down Expand Up @@ -81,9 +95,9 @@ void Closure::visit(const Allocate *op) {

void Closure::visit(const Variable *op) {
if (ignore.contains(op->name)) {
debug(3) << "Not adding " << op->name << " to closure\n";
debug(DBG) << "Not adding var " << op->name << " to closure\n";
} else {
debug(3) << "Adding " << op->name << " to closure\n";
debug(DBG) << "Adding var " << op->name << " to closure\n";
vars[op->name] = op->type;
}
}
Expand All @@ -95,5 +109,71 @@ void Closure::visit(const Atomic *op) {
op->body.accept(this);
}

Expr Closure::pack_into_struct() const {
std::vector<Expr> elements;

for (const auto &b : buffers) {
Expr ptr_var = Variable::make(type_of<void *>(), b.first);
elements.emplace_back(ptr_var);
}
for (const auto &v : vars) {
Expr var = Variable::make(v.second, v.first);
elements.emplace_back(var);
}

// Sort by decreasing size, to guarantee the struct is densely packed in
// memory. We don't actually rely on this, it's just nice to have.
std::stable_sort(elements.begin(), elements.end(),
[&](const Expr &a, const Expr &b) {
return a.type().bytes() > b.type().bytes();
});

Expr result = Call::make(Handle(),
Call::make_struct, elements, Call::Intrinsic);
return result;
}

Stmt Closure::unpack_from_struct(const Expr &e, const Stmt &s) const {
// Use the struct-packing code just to make sure the order of elements is
// the same.
Expr packed = pack_into_struct();

// Make a prototype of the packed struct
class ReplaceCallArgsWithZero : public IRMutator {
public:
using IRMutator::mutate;
Expr mutate(const Expr &e) override {
if (!e.as<Call>()) {
return make_zero(e.type());
} else {
return IRMutator::mutate(e);
}
}
} replacer;
string prototype_name = unique_name("closure_prototype");
Expr prototype = replacer.mutate(packed);
Expr prototype_var = Variable::make(Handle(), prototype_name);

const Call *c = packed.as<Call>();

Stmt result = s;
for (int idx = (int)c->args.size() - 1; idx >= 0; idx--) {
Expr arg = c->args[idx];
const Variable *var = arg.as<Variable>();
Expr val = Call::make(var->type,
Call::load_typed_struct_member,
{e, prototype_var, idx},
Call::Intrinsic);
if (stmt_uses_var(result, var->name)) {
// If a closure is generated for multiple consuming blocks of IR,
// then some of those blocks might only need some of the field.
result = LetStmt::make(var->name, val, result);
}
}
result = LetStmt::make(prototype_name, prototype, result);

return result;
}

} // namespace Internal
} // namespace Halide
10 changes: 9 additions & 1 deletion src/Closure.h
Original file line number Diff line number Diff line change
Expand Up @@ -90,11 +90,19 @@ class Closure : public IRVisitor {
**/
void include(const Stmt &s, const std::string &loop_variable = "");

/** External variables referenced. */
/** External variables referenced. There's code that assumes iterating over
* this repeatedly gives a consistent order, so don't swap out the data type
* for something non-deterministic. */
std::map<std::string, Type> vars;

/** External allocations referenced. */
std::map<std::string, Buffer> buffers;

/** Pack a closure into a struct. */
Expr pack_into_struct() const;

/** Unpack a closure around a Stmt, putting all the names in scope. */
Stmt unpack_from_struct(const Expr &, const Stmt &) const;
};

} // namespace Internal
Expand Down
73 changes: 57 additions & 16 deletions src/CodeGen_C.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1820,7 +1820,7 @@ void CodeGen_C::compile(const LoweredFunc &f, const std::map<std::string, std::s
set_name_mangling_mode(name_mangling);

std::vector<std::string> namespaces;
std::string simple_name = extract_namespaces(f.name, namespaces);
std::string simple_name = c_print_name(extract_namespaces(f.name, namespaces), false);
if (!is_c_plus_plus_interface()) {
user_assert(namespaces.empty()) << "Namespace qualifiers not allowed on function name if not compiling with Target::CPlusPlusNameMangling.\n";
}
Expand Down Expand Up @@ -2024,7 +2024,13 @@ string CodeGen_C::print_assignment(Type t, const std::string &rhs) {
if (cached == cache.end()) {
id = unique_name('_');
const char *const_flag = output_kind == CPlusPlusImplementation ? "const " : "";
stream << get_indent() << print_type(t, AppendSpace) << const_flag << id << " = " << rhs << ";\n";
if (t.is_handle()) {
// Don't print void *, which might lose useful type information. just use auto.
stream << get_indent() << "auto *";
} else {
stream << get_indent() << print_type(t, AppendSpace);
}
stream << const_flag << id << " = " << rhs << ";\n";
cache[rhs] = id;
} else {
id = cached->second;
Expand All @@ -2051,7 +2057,12 @@ void CodeGen_C::close_scope(const std::string &comment) {
}

void CodeGen_C::visit(const Variable *op) {
id = print_name(op->name);
if (starts_with(op->name, "::")) {
// This is the name of a global, so we can't modify it.
id = op->name;
} else {
id = print_name(op->name);
}
}

void CodeGen_C::visit(const Cast *op) {
Expand Down Expand Up @@ -2369,12 +2380,19 @@ void CodeGen_C::visit(const Call *op) {
} else if (op->is_intrinsic(Call::alloca)) {
internal_assert(op->args.size() == 1);
internal_assert(op->type.is_handle());
const int64_t *sz = as_const_int(op->args[0]);
if (op->type == type_of<struct halide_buffer_t *>() &&
Call::as_intrinsic(op->args[0], {Call::size_of_halide_buffer_t})) {
stream << get_indent();
string buf_name = unique_name('b');
stream << "halide_buffer_t " << buf_name << ";\n";
rhs << "&" << buf_name;
} else if (op->type == type_of<struct halide_semaphore_t *>() &&
sz && *sz == 16) {
stream << get_indent();
string semaphore_name = unique_name("sema");
stream << "halide_semaphore_t " << semaphore_name << ";\n";
rhs << "&" << semaphore_name;
} else {
// Make a stack of uint64_ts
string size = print_expr(simplify((op->args[0] + 7) / 8));
Expand Down Expand Up @@ -2422,7 +2440,7 @@ void CodeGen_C::visit(const Call *op) {
rhs << shape_name;
} else {
// Emit a declaration like:
// struct {const int f_0, const char f_1, const int f_2} foo = {3, 'c', 4};
// struct {int f_0, int f_1, char f_2} foo = {3, 4, 'c'};

// Get the args
vector<string> values;
Expand All @@ -2433,7 +2451,7 @@ void CodeGen_C::visit(const Call *op) {
// List the types.
indent++;
for (size_t i = 0; i < op->args.size(); i++) {
stream << get_indent() << "const " << print_type(op->args[i].type()) << " f_" << i << ";\n";
stream << get_indent() << print_type(op->args[i].type()) << " f_" << i << ";\n";
}
indent--;
string struct_name = unique_name('s');
Expand All @@ -2460,6 +2478,26 @@ void CodeGen_C::visit(const Call *op) {
}
rhs << "(&" << struct_name << ")";
}
} else if (op->is_intrinsic(Call::load_typed_struct_member)) {
// Given a void * instance of a typed struct, an in-scope prototype
// struct of the same type, and the index of a slot, load the value of
// that slot.
//
// It is assumed that the slot index is valid for the given typed struct.
//
// TODO: this comment is replicated in CodeGen_LLVM and should be updated there too.
// TODO: https://github.com/halide/Halide/issues/6468

internal_assert(op->args.size() == 3);
std::string struct_instance = print_expr(op->args[0]);
std::string struct_prototype = print_expr(op->args[1]);
const int64_t *index = as_const_int(op->args[2]);
internal_assert(index != nullptr);
rhs << "((decltype(" << struct_prototype << "))"
<< struct_instance << ")->f_" << *index;
} else if (op->is_intrinsic(Call::get_user_context)) {
internal_assert(op->args.empty());
rhs << "_ucon";
} else if (op->is_intrinsic(Call::stringify)) {
// Rewrite to an snprintf
vector<string> printf_args;
Expand Down Expand Up @@ -2490,7 +2528,6 @@ void CodeGen_C::visit(const Call *op) {
stream << get_indent() << "char " << buf_name << "[1024];\n";
stream << get_indent() << "snprintf(" << buf_name << ", 1024, \"" << format_string << "\", " << with_commas(printf_args) << ");\n";
rhs << buf_name;

} else if (op->is_intrinsic(Call::register_destructor)) {
internal_assert(op->args.size() == 2);
const StringImm *fn = op->args[0].as<StringImm>();
Expand All @@ -2507,7 +2544,7 @@ void CodeGen_C::visit(const Call *op) {
<< "" << struct_name << "(void *ucon, void *a) : ucon(ucon), arg((void *)a) {} "
<< "~" << struct_name << "() { " << fn->value + "(ucon, arg); } "
<< "} " << instance_name << "(_ucon, " << arg << ");\n";
rhs << print_expr(0);
rhs << "(void *)nullptr";
} else if (op->is_intrinsic(Call::div_round_to_zero)) {
rhs << print_expr(op->args[0]) << " / " << print_expr(op->args[1]);
} else if (op->is_intrinsic(Call::mod_round_to_zero)) {
Expand Down Expand Up @@ -2710,9 +2747,10 @@ void CodeGen_C::visit(const Let *op) {
if (op->value.type().is_handle()) {
// The body might contain a Load that references this directly
// by name, so we can't rewrite the name.
stream << get_indent() << print_type(op->value.type())
<< " " << print_name(op->name)
<< " = " << id_value << ";\n";
std::string name = print_name(op->name);
stream << get_indent() << "auto "
<< name << " = " << id_value << ";\n";
stream << get_indent() << "halide_unused(" << name << ");\n";
} else {
Expr new_var = Variable::make(op->value.type(), id_value);
body = substitute(op->name, new_var, body);
Expand Down Expand Up @@ -2800,12 +2838,14 @@ void CodeGen_C::visit(const VectorReduce *op) {
void CodeGen_C::visit(const LetStmt *op) {
string id_value = print_expr(op->value);
Stmt body = op->body;

if (op->value.type().is_handle()) {
// The body might contain a Load or Store that references this
// directly by name, so we can't rewrite the name.
stream << get_indent() << print_type(op->value.type())
<< " " << print_name(op->name)
<< " = " << id_value << ";\n";
std::string name = print_name(op->name);
stream << get_indent() << "auto "
<< name << " = " << id_value << ";\n";
stream << get_indent() << "halide_unused(" << name << ");\n";
} else {
Expr new_var = Variable::make(op->value.type(), id_value);
body = substitute(op->name, new_var, body);
Expand Down Expand Up @@ -3221,8 +3261,9 @@ extern "C" {
HALIDE_FUNCTION_ATTRS
int test1(struct halide_buffer_t *_buf_buffer, float _alpha, int32_t _beta, void const *__user_context) {
void * const _ucon = const_cast<void *>(__user_context);
void *_0 = _halide_buffer_get_host(_buf_buffer);
void * _buf = _0;
auto *_0 = _halide_buffer_get_host(_buf_buffer);
auto _buf = _0;
halide_unused(_buf);
{
int64_t _1 = 43;
int64_t _2 = _1 * _beta;
Expand All @@ -3248,7 +3289,7 @@ int test1(struct halide_buffer_t *_buf_buffer, float _alpha, int32_t _beta, void
{
char b0[1024];
snprintf(b0, 1024, "%lld%s", (long long)(3), "\n");
char const *_8 = b0;
auto *_8 = b0;
halide_print(_ucon, _8);
int32_t _9 = 0;
int32_t _10 = return_second(_9, 3);
Expand Down
24 changes: 24 additions & 0 deletions src/CodeGen_Hexagon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,30 @@ class InjectHVXLocks : public IRMutator {
}
Expr visit(const Call *op) override {
uses_hvx = uses_hvx || op->type.is_vector();

if (op->name == "halide_do_par_for") {
// If we see a call to halide_do_par_for() at this point, it should mean that
// this statement was produced via HexagonOffload calling lower_parallel_tasks()
// explicitly; in this case, we won't see any parallel For statements, since they've
// all been transformed into closures already. To mirror the pattern above,
// we need to wrap the halide_do_par_for() call with an unlock/lock pair, but
// that's hard to do in Halide IR (we'd need to produce a Stmt to enforce the ordering,
// and the resulting Stmt can't easily be substituted for the Expr here). Rather than
// make fragile assumptions about the structure of the IR produced by lower_parallel_tasks(),
// we'll use a trick: we'll define a WEAK_INLINE function, _halide_hexagon_do_par_for,
// which simply encapsulates the unlock()/do_par_for()/lock() sequences, and swap out
// the call here. Since it is inlined, and since uses_hvx_var gets substituted at the end,
// we end up with LLVM IR that properly includes (or omits) the unlock/lock pair depending
// on the final value of uses_hvx_var in this scope.

internal_assert(op->call_type == Call::Extern);
internal_assert(op->args.size() == 4);

std::vector<Expr> args = op->args;
args.push_back(cast<int>(uses_hvx_var));

return Call::make(Int(32), "_halide_hexagon_do_par_for", args, Call::Extern);
}
return op;
}

Expand Down
Loading