Major revamp to Halide 16.0 with Anderson2021 GPU autoscheduler #67

antonysigma · 2023-04-21T18:24:49Z

(Adding the task dependencies for my own reminder.)

Wait for the Halide 16.0 release.
Refactor the Halide::BoundaryConditions calls to use the new APIs;
Similarly, refactor Generator::* related code to use Halide 16.0 APIs;
In algorithms/ladmm.py, ensure all Numpy matrices are Fortran order by default; this avoids the frequent C-order to F-order typecasting overhead in the (L-)ADMM iterations;
Similarly, ensure Halide-accelerated linear operators, e.g. A_mask.cpython.so writes to the output buffers in F-order, not some orphan buffers that are immediately destroyed. This should solve the convergence failure bugs whenever implem='Halide' is defined.
Wait until Anderson2021 algorithm optimizer is ready for production (ASAN reports out-of-bounds read error in anderson2021_test_apps_autoscheduler halide/Halide#7606).
(Optional) Compile the Halide generators with C++20; this should cut the compile time in half thanks to new C++ Concepts feature;
(Optional) reduce code bloat of ladmm-iter-gen.cpp with the broadcast operator Halide::_.
Replace Li2018 autoscheduler with Anderson2021: the latter utilizes the GPU cache and shared memory in the SM far better.

The text was updated successfully, but these errors were encountered:

antonysigma mentioned this issue Apr 24, 2023

Disable C-to-F casting in Halide-accelerated functions #68

Merged

antonysigma mentioned this issue Jul 14, 2023

Upgrade Halide toolchain to 14.0 #73

Merged

antonysigma mentioned this issue Aug 9, 2023

Backport the Anderson2021 autoscheduler #79

Merged

antonysigma mentioned this issue Oct 24, 2023

Anderson2021 autoscheduler triggers "producer_store_instances > 0" #93

Open

Provide feedback