Add GPU autoscheduler #6856

aekul · 2022-07-19T06:30:41Z

This is a draft PR to merge the anderson2021 GPU autoscheduler (#5602). It includes the code (there is some overlap with adams2019, but it will probably be a substantial amount of work to deduplicate them), some tests, utility scripts for generating data/statistics/etc. (these can be removed if desired), and baseline weights (trained on a V100).

steven-johnson · 2022-07-19T18:51:26Z

Step 1: please run run-clang-format.sh and run-clang-tidy.sh on these and fix the issues :-)

steven-johnson

Yeah, there really is a lot of overlap with Adams2019 here. Would it be tractable to merge these changes into Adams2019, or at least have shared code between them?

steven-johnson · 2022-07-19T18:53:05Z

apps/cuda_mat_mul/mat_mul_generator.cpp

-            .unroll(r, 8);
-        A.in().compute_at(prod, r).vectorize(_0).unroll(_1);
-        B.in().compute_at(prod, r).vectorize(_0).unroll(_1);
+        if (!auto_schedule) {


Here and elsewhere: as of #6838, you no longer should examine the auto_schedule GeneratorParam directly; instead, please call the using_autoscheduler() method.

steven-johnson · 2022-07-19T18:55:33Z

src/autoschedulers/anderson2021/Makefile

+
+$(BIN)/cost_model/%.a: $(BIN)/cost_model.generator
+	@mkdir -p $(@D)
+	$^ -g $* -o $(BIN)/cost_model -f $* target=$(HL_TARGET)-no_runtime auto_schedule=false enable_debug_output=$(ENABLE_DEBUG_OUTPUT) -e stmt,static_library,h,assembly


Here and elsewhere: auto_schedule=false is no longer supported and should be removed.

steven-johnson · 2022-07-19T18:56:21Z

src/autoschedulers/anderson2021/Makefile

+$(BIN)/%/demo.a: $(GENERATOR_BIN)/demo.generator $(BIN)/libautoschedule_anderson2021.$(SHARED_EXT)
+	@mkdir -p $(@D)
+	HL_WEIGHTS_DIR=$(SRC)/baseline.weights \
+	$(GENERATOR_BIN)/demo.generator -g demo -o $(@D) -f demo target=$* auto_schedule=true -p $(BIN)/libautoschedule_anderson2021.$(SHARED_EXT) -s Anderson2021


Here and elsewhere: auto_schedule=true -s Anderson2021 is no longer supported syntax. Instead, specify autoscheduler=Anderson2021 to replace both of these.

…d entry point

aekul · 2022-08-18T05:32:39Z

Some of the files (e.g. ASLog.h/.cpp, PerfectHashMap.h, and a few others) are identical and could be merged. Would autoschedulers/common be the right place for them? Looks like ASLog.h/.cpp are already there.

Unfortunately, most of the other files (e.g. Autoschedule.h/cpp, LoopNest.h/cpp, etc.) have diverged considerably and it would be a significant amount of work to merge them.

steven-johnson · 2022-08-18T17:02:30Z

Some of the files (e.g. ASLog.h/.cpp, PerfectHashMap.h, and a few others) are identical and could be merged. Would autoschedulers/common be the right place for them? Looks like ASLog.h/.cpp are already there.

Yep!

Unfortunately, most of the other files (e.g. Autoschedule.h/cpp, LoopNest.h/cpp, etc.) have diverged considerably and it would be a significant amount of work to merge them.

No worries then.

aekul · 2023-02-16T06:55:15Z

@steven-johnson The buildbots are green now

steven-johnson

This looks pretty great, thanks for doing all this work to get this landed! Just a handful of mostly style nits and such, with a few relatively minor things that need addressing (and a few things that would be nice for someone to work on as a followup).

steven-johnson · 2023-02-21T18:07:45Z

src/autoschedulers/anderson2021/AutoSchedule.cpp

+    IntrusivePtr<State> optimal_schedule(int beam_size);
+};
+
+#ifdef HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API


FYI: HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API is now removed in Halide main, so this should just be removed here and elsewhere. (We can do this as a followup PR if you like.)

steven-johnson · 2023-02-21T18:09:08Z

src/autoschedulers/anderson2021/AutoSchedule.cpp

+  That said, here are the (legacy) env vars you can still use when HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API
+  is defined:
+
+  HL_BEAM_SIZE


FYI: since HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API is now removed in Halide main, most of these env vars are probably no longer actually in use (settings would be controlled by GeneratorParams instead), so this should probably get pruned (subsequent PR is fine)

steven-johnson · 2023-02-21T18:12:01Z

src/autoschedulers/anderson2021/AutoSchedule.h

+
+typedef PerfectHashMap<FunctionDAG::Node::Stage, ScheduleFeatures> StageMapOfScheduleFeatures;
+
+void find_and_apply_schedule(FunctionDAG &dag, const std::vector<Function> &outputs, const Anderson2021Params &params, const Target &target, CostModel *cost_model, int beam_size, StageMapOfScheduleFeatures *schedule_features);


Style nit: this is an awfully long line

steven-johnson · 2023-02-21T18:14:20Z

src/autoschedulers/anderson2021/DefaultCostModel.cpp

+                  "Incorrect size for pipeline features");
+    int num_stages = 0;
+    for (const auto &n : dag.nodes) {
+        if (!n.is_input) num_stages += (int)n.stages.size();


style nit, here and elsewhere: generally, Halide always encloses if clauses in braces, even for a single line:

if (...) { ... }

steven-johnson · 2023-02-21T18:17:03Z

src/autoschedulers/anderson2021/DefaultCostModel.cpp

+        *(cost_ptrs(i)) = dst(i);
+        if (std::isnan(dst(i))) {
+            any_nans = true;
+            aslog(0) << "Prediction " << i << " is NaN. True runtime is " << true_runtimes(i) << "\n";


Generally, we prefer the autoschedulers to be 'quiet' by default -- do we really want/need these to be aslog(0) (vs aslog(1) etc)?

steven-johnson · 2023-02-21T18:46:07Z

src/autoschedulers/anderson2021/State.cpp

+    StageMap<StageMap<bool>> descendants;
+    root->get_stages_computed_in_each_compute_root_loop(descendants);
+
+    aslog(0) << "BEGIN compute locations\n";


Does this need to be aslog(0)?

steven-johnson · 2023-02-21T18:48:35Z

src/autoschedulers/anderson2021/ThreadInfo.h

+namespace Internal {
+namespace Autoscheduler {
+
+#define MAX_THREADS_PER_BLOCK 1024


prefer constexpr int for consts like this

steven-johnson · 2023-02-21T18:49:35Z

src/autoschedulers/anderson2021/Tiling.h

+#include <cstdint>
+#include <vector>
+
+using std::vector;


Never, ever, ever put a using into a .h file in the global namespace.

steven-johnson · 2023-02-21T18:50:51Z

src/autoschedulers/anderson2021/autotune_loop.sh

+# benchmarked serially.
+BATCH_SIZE=80
+EPOCHS=200
+NUM_GPUS=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)


Yes, definitely, NVidia is not the only GPU vendor out there :-)

steven-johnson · 2023-02-21T18:53:56Z

src/autoschedulers/anderson2021/generate_data.sh

+    echo "Predict only mode: ON"
+fi
+
+DEFAULT_SAMPLES_DIR_NAME="${SAMPLES_DIR:-autotuned_samples}"


hard-to-parse error messages are not good, please fix :-)

Yongqi-Zhuo · 2023-03-02T13:38:08Z

src/autoschedulers/anderson2021/DefaultCostModel.cpp

+    }
+
+    cursor = 0;
+    cost_per_stage_ptrs.clear();


I might have been wrong, but it seems that the buildbots are not running the GPU autoscheduler related tests?
For example, I pulled this branch and tried running anderson2021_test_apps_autoscheduler (i.e., test/autoschedulers/anderson2021/test.cpp) but it instantly fails, complaining that access to cost_per_stage_ptrs is out of bound. I commented this line (cost_per_stage_ptrs.clear();) and it seems to work again.

I might have been wrong, but it seems that the buildbots are not running the GPU autoscheduler related tests?

That's correct and as-intended, since this branch hasn't landed yet. The buildbots usually aren't configured to test major new features before they have landed. This PR is close, but has a number of nits that need fixing first.

Yongqi-Zhuo · 2023-03-02T13:45:45Z

src/autoschedulers/anderson2021/State.cpp

+        // Make the vectorized dimension of the inner loop 32 (or as
+        // close as possible)
+        int64_t inner_extent = std::min(c->size[vectorized_loop_index], (int64_t)32);
+
+        if (c->stage->index == 0) {
+            vector<int64_t> tiling(c->node->dimensions, 1);
+
+            // Split into parallelized and serial
+            c = c->parallelize_in_tiles(tiling, loop_nest, params, target, true, false);
+
+            if (vectorized_loop_index >= 0) {
+                tiling[vectorized_loop_index] = inner_extent;
+            }


It seems that it is not checked for whether vectorized_loop_index == -1 here in line 222 (and it seems that -1 is a legal value here because of the subsequent check in line 230), and leads to a crash when running test/autoschedulers/anderson2021/test.cpp. I changed

int64_t inner_extent = std::min(c->size[vectorized_loop_index], (int64_t)32);

to

int64_t inner_extent = vectorized_loop_index != -1 ? std::min(c->size[vectorized_loop_index], (int64_t)32) : (int64_t)32;

to fix this but I am not sure if this is right.

aekul · 2023-04-04T02:40:26Z

@steven-johnson Thanks for the review! Made all the requested changes.

Thanks to all the others who left comments/suggestions too.

abadams · 2023-04-04T03:41:56Z

apps/images/low_res_in.png

This appears to be a duplicate of apps/images/rgb_small.png

abadams · 2023-04-04T03:42:58Z

test/autoschedulers/adams2019/demo_generator.cpp

@@ -6,10 +6,10 @@ using namespace Halide;

 class ConvRelu : public Halide::Generator<ConvRelu> {
 public:
-    Input<Buffer<float, 4>> input{"input"};


These changes look unintentional (same for the file below)

Fixed here: #7475

Add Anderson2021 GPU autoscheduler

aekul requested a review from steven-johnson July 19, 2022 06:30

steven-johnson reviewed Jul 19, 2022

View reviewed changes

aekul added 10 commits July 28, 2022 01:36

Add GPU autoscheduler

0046184

clang-format

d939ea5

clang-format 13

73b055a

run clang-tidy, remove MachineParams, use new autoscheduler params an…

dfe18b8

…d entry point

remove commented code

4ad51a2

use updated api

9932266

use updated api

e4672de

clang-format

efa1d83

remove MachineParams and fix parallelism parameter

937c730

fix test

bb571dd

aekul force-pushed the gpu-autoscheduler branch from 5b92284 to bb571dd Compare August 18, 2022 05:22

add CMakeLists.txt

bddfc57

aekul added 13 commits August 19, 2022 01:00

remove ASLog.h/cpp

59617c2

move PerfectHashMap.h to common/

ec5a6ee

move test_function_dag.cpp to common/

56d5e71

move featurization_to_sample.cpp to common/

6e68cd4

move test_perfect_hash_map.cpp to common/

7d3fbb4

remove Errors.h

ae9b216

move get_host_target.cpp to common/

d7c7e5a

move weightsdir_to_weightsfile.cpp to common/

b529aeb

remove MACHINE_PARAMS

0a99054

move demo_generator.cpp to common/

8a7923b

remove files from Adams2019

bee009e

move included_schedule_file_generator.cpp to common/

1e3fa11

move Weights.h/cpp to common/

489509b

steven-johnson approved these changes Feb 21, 2023

View reviewed changes

Yongqi-Zhuo reviewed Mar 2, 2023

View reviewed changes

aekul added 17 commits March 17, 2023 00:31

Fix long line

64148a9

Add braces around if statements

86c4015

aslog(0) -> aslog(1)

08c778e

abort -> internal_assert

8468f59

Remove default destructor

940f644

Fix long line

c860696

Fix long lines

1df7dc2

Reorder parameters

cf15af2

Fix long line

79c4713

Remove empty clause

50ba6e0

Remove blank line

6924d1a

Uppercase enum, std::vector, constexpr

5aad419

Remove HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API code

5fab295

Remove HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API code

766a010

Tidy up test input, move inner_extent

25da22a

Merge remote-tracking branch 'upstream/main' into gpu-autoscheduler

98ee389

clang-format

1a84237

aekul merged commit df354c5 into halide:main Apr 4, 2023

abadams reviewed Apr 4, 2023

View reviewed changes

apps/images/low_res_in.png

Copy link

Member

abadams Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a duplicate of apps/images/rgb_small.png

abadams reviewed Apr 4, 2023

View reviewed changes

aekul mentioned this pull request Apr 4, 2023

Use static dimensions in autoscheduler test generators #7475

Merged

steven-johnson added the release_notes For changes that may warrant a note in README for official releases. label Apr 4, 2023

antonysigma mentioned this pull request Apr 10, 2023

Merging GPU autoscheduler #5602

Open

antonysigma mentioned this pull request Apr 21, 2023

Major revamp to Halide 16.0 with Anderson2021 GPU autoscheduler comp-imaging/ProxImaL#67

Closed

7 tasks

ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024

Add GPU autoscheduler (halide#6856)

0bbb6d8

Add Anderson2021 GPU autoscheduler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU autoscheduler #6856

Add GPU autoscheduler #6856

aekul commented Jul 19, 2022

steven-johnson commented Jul 19, 2022

steven-johnson left a comment

steven-johnson Jul 19, 2022

steven-johnson Jul 19, 2022

steven-johnson Jul 19, 2022

aekul commented Aug 18, 2022

steven-johnson commented Aug 18, 2022

aekul commented Feb 16, 2023

steven-johnson left a comment

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

steven-johnson Feb 21, 2023

Yongqi-Zhuo Mar 2, 2023

steven-johnson Mar 2, 2023

Yongqi-Zhuo Mar 2, 2023

aekul commented Apr 4, 2023

abadams Apr 4, 2023

abadams Apr 4, 2023

aekul Apr 4, 2023


		typedef PerfectHashMap<FunctionDAG::Node::Stage, ScheduleFeatures> StageMapOfScheduleFeatures;

		void find_and_apply_schedule(FunctionDAG &dag, const std::vector<Function> &outputs, const Anderson2021Params &params, const Target &target, CostModel cost_model, int beam_size, StageMapOfScheduleFeatures schedule_features);

Add GPU autoscheduler #6856

Add GPU autoscheduler #6856

Conversation

aekul commented Jul 19, 2022

steven-johnson commented Jul 19, 2022

steven-johnson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aekul commented Aug 18, 2022

steven-johnson commented Aug 18, 2022

aekul commented Feb 16, 2023

steven-johnson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aekul commented Apr 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment