[ANSOR] Auto-scheduler tutorial for GPU and necessary refactor/fix #6512

merrymercy · 2020-09-18T05:14:47Z

Add a tutorial on auto-scheduling a subgraph for GPU (tutorials/auto_scheduler/tune_conv2d_layer_cuda.py)
Refactor evolutionary search
Fix MutateComputeLocation
- In the old implementation we reuse InitChangeComputeLocation, but this is wrong. Because it always appends new steps and will make the transform history too long. In the new implementation, we only mutate the old steps, so the number of steps keeps the same.

merrymercy · 2020-09-18T05:34:05Z

cc @jcf94 @comaniac @junrushao1994 @binarybana

python/tvm/auto_scheduler/measure.py

jcf94 · 2020-09-18T06:27:03Z

tutorials/auto_scheduler/tune_matmul_x86.py

 ######################################################################
 # .. note::
 #   We cannot run the line above because of the conflict between
 #   python's multiprocessing and tvm's thread pool.
-#   After running a tvm generated binary (L112), the python's multiprocessing
-#   library will hang forever.
-#   You have to make sure that you don't run any tvm generated binaries before
-#   calling ansor's search. To run the L156 above, you should comment out L112-114.
+#   After running a tvm generated binary the python's multiprocessing library 
+#   will hang forever. You have to make sure that you don't run any tvm 
+#   generated binaries before calling auot-scheduler's search.
+#   To run the function above, you should comment out all code in 
+#   "Check correctness and evaluate performance" section.
 #
 #   You should be careful about this problem in your applications.
 #   There are other workarounds for this problem.
 #   For example, you can start a new thread/process (with the builtin python library
 #   threading or multiprocessing) and run the tvm binaries in the new thread/process.
 #   This provides an isolation and avoids the conflict in the main thread/process.
+#   You can also use :any:`auto_scheduler.measure.LocalRPCMeasureContext` for auto-scheduler,
+#   as shown in the GPU tutorial (:ref:`auto-scheduler-conv-gpu`).


This may still be confusing. I'm thinking it may be better to just delete the resume part of the CPU tutorials to keep it as simple as possible.

We can add a comment saying that LocalRPCMeasureContext is recommended to be used in all hardware targets in the GPU tutorial.

People are likely to get into this problem in their applications, so it is worth to let them aware of this problem in the tutorial.

Actually, I removed LocalRPCMeasureContext to make the major part of the tutorial simpler.
Because resuming a search is a very advanced usage, most people don't need this part. So we can make it complicated. But all sections above should be as simple as possible.

src/auto_scheduler/search_policy/sketch_policy.cc

tutorials/auto_scheduler/tune_conv2d_layer_cuda.py

tutorials/auto_scheduler/tune_matmul_x86.py

comaniac · 2020-09-18T17:38:09Z

tutorials/auto_scheduler/tune_matmul_x86.py

+#   You can also use :any:`auto_scheduler.measure.LocalRPCMeasureContext` for auto-scheduler,
+#   as shown in the GPU tutorial (:ref:`auto-scheduler-conv-gpu`).


Intuitively, if there's no obvious performance impact, we should use RPC runner on both CPU and GPU, so it'd better to mention why we didn't use it in this tutorial.

I missed the comment from @jcf94 when reviewing this part. I agree that this may be confusing, but I prefer to keep the resume part; otherwise users will still encounter this issue when they are trying to resume the search. For example, one may write a script to do the following:

sch, args = do_search(task, log_file) perf = evaluate_result(sch, args) while perf < goal: sch, args = resume_search(task, log_file) perf = evaluate_result(sch, args)

merrymercy · 2020-09-19T08:49:48Z

Comments are addressed. @comaniac @jcf94 @tqchen

src/auto_scheduler/search_policy/sketch_policy.cc

comaniac · 2020-09-19T22:38:43Z

Thanks @merrymercy @jcf94

…pache#6512) * add gpu tutorial * refactor mutation in evolutionary search * update * update double matmul * fix lint * add double matmul test * fix mutate compute location * fix sketch search policy * fix lint * update * address comments * fix PruneInvalidStates

merrymercy added 9 commits September 17, 2020 19:56

add gpu tutorial

1e71fb3

refactor mutation in evolutionary search

b8584fe

update

85a3928

update double matmul

93ff1b1

fix lint

1667614

add double matmul test

efad9d5

fix mutate compute location

813dd74

fix sketch search policy

e602929

Merge branch 'master' into ansor-gpu-tutorial

2b3ef79

merrymercy requested a review from comaniac September 18, 2020 05:34

merrymercy force-pushed the ansor-gpu-tutorial branch from 21e013b to 8efbc59 Compare September 18, 2020 06:08

fix lint

8f46f2e

merrymercy force-pushed the ansor-gpu-tutorial branch from 8efbc59 to 8f46f2e Compare September 18, 2020 06:10

update

d22ba79

jcf94 reviewed Sep 18, 2020

View reviewed changes

comaniac requested changes Sep 18, 2020

View reviewed changes

merrymercy added 2 commits September 19, 2020 00:53

address comments

2621b8d

fix PruneInvalidStates

1415226

merrymercy commented Sep 19, 2020

View reviewed changes

src/auto_scheduler/search_policy/sketch_policy.cc Show resolved Hide resolved

comaniac approved these changes Sep 19, 2020

View reviewed changes

comaniac merged commit eee04c0 into apache:master Sep 19, 2020

merrymercy deleted the ansor-gpu-tutorial branch September 20, 2020 23:43

jcf94 mentioned this pull request Sep 25, 2020

[Ansor][FLAKY] Bug fix for compute at mutation error #6557

Merged

comaniac mentioned this pull request Sep 25, 2020

[RFC] v0.7 Release Planning #6421

Closed

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ANSOR] Auto-scheduler tutorial for GPU and necessary refactor/fix #6512

[ANSOR] Auto-scheduler tutorial for GPU and necessary refactor/fix #6512

merrymercy commented Sep 18, 2020 •

edited

Loading

merrymercy commented Sep 18, 2020

jcf94 Sep 18, 2020

merrymercy Sep 19, 2020

merrymercy Sep 19, 2020

comaniac Sep 18, 2020

comaniac Sep 18, 2020

merrymercy commented Sep 19, 2020

comaniac commented Sep 19, 2020

		# You can also use :any:`auto_scheduler.measure.LocalRPCMeasureContext` for auto-scheduler,
		# as shown in the GPU tutorial (:ref:`auto-scheduler-conv-gpu`).

[ANSOR] Auto-scheduler tutorial for GPU and necessary refactor/fix #6512

[ANSOR] Auto-scheduler tutorial for GPU and necessary refactor/fix #6512

Conversation

merrymercy commented Sep 18, 2020 • edited Loading

merrymercy commented Sep 18, 2020

jcf94 Sep 18, 2020

Choose a reason for hiding this comment

merrymercy Sep 19, 2020

Choose a reason for hiding this comment

merrymercy Sep 19, 2020

Choose a reason for hiding this comment

comaniac Sep 18, 2020

Choose a reason for hiding this comment

comaniac Sep 18, 2020

Choose a reason for hiding this comment

merrymercy commented Sep 19, 2020

comaniac commented Sep 19, 2020

merrymercy commented Sep 18, 2020 •

edited

Loading