[Ansor][AutoTVM v2.0] Phase 1: Add cache_read/cache_write steps #6107

jcf94 · 2020-07-22T02:07:21Z

For the full upstream plan, see Ansor RFC.

In this PR, we bring cache_read/cache_write steps for Ansor auto_scheduler.

These steps will insert extra stage to the original ComputeDAG, the class member current_compute_dag in State is used to track the up-to-date ComputeDAG.

cc @merrymercy @comaniac @junrushao1994 @FrozenGene

comaniac

Finished a half.

python/tvm/auto_scheduler/loop_state.py

comaniac · 2020-07-22T03:22:59Z

python/tvm/auto_scheduler/loop_state.py

+        # Add a new stage will change all ops. But we still want to use the old ops to index stages,
+        # So we keep updating them and do not remove the old ops.


This description is better to just be the docstring of this function.

The function name is misleading. This function does not insert a new stage. Instead, it updates the stage ID map by taking a new stage ID. The name like _update_stage_id_map might be better.

The return type of this function is improper, since it does nothing with added_op. Better to not return anything.

I feel this function should not be a standalone function. This is tightly coupled CacheRead/CacheWrite implementation. In this case, it is improper to explicitly call this function.

This is an internal function which should not be called outside, seems the docstring is unnecessary.

This function actuall means insert a new stage to the id map(add should modify the stage id behind this stage after the insert), will consider for a better name.

Good point.

This will be used in Rfactor, too.

Updated to a _apply_stage_id_offset, this plays a similar role as the AttachMap::ApplyStageIdOffset.

I know we don't need docstring for internal functions, but it's not a problem if we need. At least we should put the description in the beginning of the function.

_apply_stage_id_offset sounds better.

I didn't mean this function will be used by CacheRead/Write only. I meant this function is required when we implement a step that changes the stages, but there's no checking to enforce that. An ideal solution would be having a function to deal with stage changing and let CacheRead/CacheWrite/Rfactor call the function.

comaniac · 2020-07-22T03:36:45Z

python/tvm/auto_scheduler/loop_state.py

+        self.stage_id_map[added_op] = new_stage_id
+
+        # Update stage_id_map for new ops
+        self._update_stage_id_map()


It seems to me that this call will override the mapped stage ID of all ops from stages but don't fully understand. Could you elaborate more about _insert_new_stage?

After cache_read/cache_write, actually all the stages behind the new added stage will be update, their ops are different from before.
This is to keep the original Tensor/ops works and add those new ops to the id map.

So we can imagine the stage ID map will have all ops, including out-of-date and up-to-date ops. Whatever the op is up-to-date, this function makes sure it points to the right stage.

If the above statement about my understanding to the stage ID map is correct, then I agree with you that this is the must. Meanwhile, we need to add more explanations.

src/auto_scheduler/compute_dag.h

comaniac · 2020-07-22T04:04:04Z

src/auto_scheduler/loop_state.h

+   * \brief Traverse through `stage_to_attach_iter` and `iter_to_attached_stages` map, add offset
+   * to stage indexes that are larger than the start_id. Used for steps that inserts net stages to


You mean "and update offset"?

src/auto_scheduler/loop_state.h

tqchen · 2020-07-22T16:07:36Z

cc @jroesch @jwfromm @junrushao1994

comaniac

This PR is a good example to demonstrate how to add steps :)

src/auto_scheduler/transform_step.cc

junrushao

I agree with @comaniac for most of the points. Also some additional comments. Thanks for the contribution!

junrushao · 2020-07-23T03:18:09Z

tests/python/unittest/test_auto_scheduler_loop_state.py

+    #
+    # Seems there's bug with the input/output tensor. Such multi outputs case
+    # should be unusual, so we make some hack on DoCacheWrite
+    # To be fixed in the future


I think maybe we should explicitly add a "TODO" mark here?

Same comment as above, we shouldn't use TODOs but we should track these in an Ansor stabilization PR

I think these TODOs are necessary, or others may be confused seeing these codes.
We dose have a track issue for all those needs to be handled: https://github.com/merrymercy/Ansor/issues/65

This is a bug of tvm and is outside the scope of Ansor. This feature is never used in Ansor. We just found this bug by accident during development. So we can just skip it

src/auto_scheduler/transform_step.cc

jroesch

My high level comments are the documentation needs more clarity before we merge, and we should move out all TODOs into a larger Ansor tracking issue which tracks all bugs/features that need to be resolved before stabilization.

jroesch · 2020-07-23T20:45:29Z

python/tvm/auto_scheduler/loop_state.py

@@ -351,6 +351,72 @@ def compute_root(self, stage):
        self.state_object = _ffi_api.StateComputeRoot(self.state_object,
                                                      self._resolve_stage_id(stage))

+    def cache_read(self, stage, scope_name, reader_stages):
+        """ Schedule primitive corresponds to te.schedule.cache_read.
+


Can you explain what this step does?

This step does the same thing as te.schedule.cache_read does. We choose to add a pointer to te.schedule.cache_read instead of copying the docstring from it.

Maybe we can make the pointer more clear. (e.g., say "see also te.schedule.cache_read")

Added a pointer to te.schedule.cache_read, we may also add this to other steps later.

python/tvm/auto_scheduler/loop_state.py

src/auto_scheduler/loop_state.h

jroesch · 2020-07-23T20:47:38Z

src/auto_scheduler/loop_state.h

@@ -225,6 +238,13 @@ class StateNode : public Object {
   * operation.
   */
  AttachMap attach_map;
+  /*!
+   * \brief The up-to-date ComputeDAG of this state, used for some steps that may change the


Can you explain this better? given the above methods it seems that current_compute_dag might in fact not be up-to-date, given that some scheduling steps modify the compute dag.

This dag is always up-to-date.
The comment in the above method says the "initial dag" may not be up-to-date. So we need to store a new up-to-date dag here.

src/auto_scheduler/loop_state.h

jroesch · 2020-07-23T20:51:09Z

src/auto_scheduler/transform_step.cc

+/********** Primitives adding new stages **********/
+
+/*!
+ * \brief Common part for steps that add new stages(e.g. CacheReadStep, CacheWriteStep,


Can you explain this more?

jroesch · 2020-07-23T20:52:03Z

src/auto_scheduler/transform_step.cc

+  const ComputeDAG& current_compute_dag =
+      dag.ReplayAndGetDAG(GetStageModifiableSteps(GetRef<Step>(this), (*state)->transform_steps));
+  int added_ops = current_compute_dag->ops.size() - last_dag_op_size;
+  // TODO(jcf94): Update this check to equal after fixing the cache write bug in TVM


Can we track all of these in an Ansor tracking issue instead of putting TODOs in the code. My worry is it is very easy to forget about all the bugs that must be resolved before stabilizing a new subsystem.

Yeah, I can open an issue later.
For this specific line, this is a bug of tvm and this feature is not used in Ansor. So it is okay to skip this.

src/auto_scheduler/transform_step.h

jroesch · 2020-07-23T20:59:54Z

tests/python/unittest/test_auto_scheduler_loop_state.py

+    #
+    # Seems there's bug with the input/output tensor. Such multi outputs case
+    # should be unusual, so we make some hack on DoCacheWrite
+    # To be fixed in the future


Same comment as above, we shouldn't use TODOs but we should track these in an Ansor stabilization PR

src/auto_scheduler/transform_step.cc

src/auto_scheduler/transform_step.h

jcf94 · 2020-07-24T02:21:58Z

@merrymercy @junrushao1994 @jroesch @comaniac Comments are addressed, please take another look.
I'll submit the PRs for the rest steps soon after this.

merrymercy · 2020-07-24T02:28:22Z

I opened an issue to track the unresolved todo items during upstream. #6133
However, all todos in this PR are not related to Ansor and are outside the scope of Ansor. They won't be tracked in the above issue either.
They are just minor bugs of tvm we found during the development. We don't have the plan to fix them because we don't use these features.

python/tvm/auto_scheduler/compute_dag.py

src/auto_scheduler/transform_step.cc

src/auto_scheduler/loop_state.h

src/auto_scheduler/transform_step.h

src/auto_scheduler/compute_dag.h

src/auto_scheduler/loop_state.h

zhiics · 2020-07-24T18:04:12Z

src/auto_scheduler/transform_step.cc

+ * RfactorStep). This will return all steps that can change the number of stages in a ComputeDAG,
+ * and stop by the current step.
+ */
+Array<Step> GetFormerStageModifiableSteps(Step current_step, const Array<Step>& transform_steps) {


This naming seems not quite clear. How about GetMutableSteps?

Or StageMutableSteps ?
It seems that just MutableSteps is still not quite clear.

merrymercy · 2020-07-25T12:08:13Z

@jcf94 Sorry, you have to rebase because I moved hearder files to the public folder include/tvm/auto_scheduler in #6103

jcf94 · 2020-07-27T01:45:27Z

@jcf94 Sorry, you have to rebase because I moved hearder files to the public folder include/tvm/auto_scheduler in #6103

Fine, updated with the new upstream master.

merrymercy · 2020-07-27T04:28:35Z

@jroesch @comaniac please take another look and approve

comments are addressed

merrymercy · 2020-07-27T04:35:23Z

I dismissed @jroesch 's review because all of his comments are addressed.
Most of his concern is due to some todo items caused by a minor TVM bug, which is outside of the scope of this PR.
Following his suggestion, I also opened an issue (#6133) to track other todo items.

…he#6107) * Add cache_read/cache_write step * Update * Update * Update * Update state->current_compute_dag to Optional * Update * Update doc * Update * Update * Doc update * Update

Add cache_read/cache_write step

d12465d

comaniac requested changes Jul 22, 2020

View reviewed changes

Update

920f4b1

comaniac requested changes Jul 22, 2020

View reviewed changes

jcf94 added 3 commits July 23, 2020 09:49

Update

86c3670

Update

abfb150

Update state->current_compute_dag to Optional

3c1da64

junrushao reviewed Jul 23, 2020

View reviewed changes

jroesch previously requested changes Jul 23, 2020

View reviewed changes

merrymercy reviewed Jul 24, 2020

View reviewed changes

src/auto_scheduler/transform_step.cc Outdated Show resolved Hide resolved

merrymercy reviewed Jul 24, 2020

View reviewed changes

src/auto_scheduler/transform_step.h Outdated Show resolved Hide resolved

merrymercy reviewed Jul 24, 2020

View reviewed changes

src/auto_scheduler/transform_step.h Outdated Show resolved Hide resolved

Update

2a113d3

jcf94 requested review from merrymercy and jroesch July 24, 2020 02:16

jcf94 added 2 commits July 24, 2020 10:39

Update doc

bf660a8

Update

3649e26

merrymercy requested changes Jul 24, 2020

View reviewed changes

python/tvm/auto_scheduler/compute_dag.py Outdated Show resolved Hide resolved

src/auto_scheduler/transform_step.cc Outdated Show resolved Hide resolved

src/auto_scheduler/loop_state.h Outdated Show resolved Hide resolved

src/auto_scheduler/transform_step.h Outdated Show resolved Hide resolved

Update

85da7e0

zhiics reviewed Jul 24, 2020

View reviewed changes

Doc update

87e703a

tqchen added the status: need review label Jul 25, 2020

jcf94 added 2 commits July 27, 2020 09:22

Merge branch 'upstream_master' into cache_read_write

18b19ad

Update

334de3b

jcf94 requested a review from merrymercy July 27, 2020 01:45

merrymercy approved these changes Jul 27, 2020

View reviewed changes

merrymercy merged commit b8f8b8d into apache:master Jul 27, 2020

jcf94 deleted the cache_read_write branch July 29, 2020 07:29

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

		# Add a new stage will change all ops. But we still want to use the old ops to index stages,
		# So we keep updating them and do not remove the old ops.

		* \brief Traverse through `stage_to_attach_iter` and `iter_to_attached_stages` map, add offset
		* to stage indexes that are larger than the start_id. Used for steps that inserts net stages to

[Ansor][AutoTVM v2.0] Phase 1: Add cache_read/cache_write steps #6107

[Ansor][AutoTVM v2.0] Phase 1: Add cache_read/cache_write steps #6107

Conversation

jcf94 commented Jul 22, 2020 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcf94 Jul 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcf94 Jul 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 22, 2020 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

junrushao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

jroesch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcf94 Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcf94 commented Jul 24, 2020

merrymercy commented Jul 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Jul 25, 2020 • edited Loading

jcf94 commented Jul 27, 2020

merrymercy commented Jul 27, 2020

merrymercy commented Jul 27, 2020 • edited Loading

jcf94 commented Jul 22, 2020 •

edited

Loading

jcf94 Jul 22, 2020 •

edited

Loading

jcf94 Jul 22, 2020 •

edited

Loading

tqchen commented Jul 22, 2020 •

edited

Loading

merrymercy Jul 24, 2020 •

edited

Loading

merrymercy Jul 24, 2020 •

edited

Loading

jcf94 Jul 24, 2020 •

edited

Loading

merrymercy Jul 24, 2020 •

edited

Loading

merrymercy commented Jul 24, 2020 •

edited

Loading

merrymercy commented Jul 25, 2020 •

edited

Loading

merrymercy commented Jul 27, 2020 •

edited

Loading