[Ansor][AutoTVM v2.0] Phase 1: feature extraction for cost models #6190

merrymercy · 2020-08-02T23:20:11Z

For the full upstream plan, see Ansor RFC.

This PR adds feature extraction for the cost models.
It is similar to the existing feature extraction in autotvm but

Uses a general format for all loop structures, so all workloads can share the same cost model.
Includes more analysis results (e.g., reuse distance) into the feature vector.

merrymercy · 2020-08-02T23:21:41Z

cc @tqchen @junrushao1994 @jcf94 @comaniac @whbldhwj

comaniac

Reviewed all but feature.cc. Will finish it by tomorrow.

include/tvm/auto_scheduler/feature.h

python/tvm/auto_scheduler/feature.py

include/tvm/auto_scheduler/feature.h

src/auto_scheduler/feature.cc

comaniac

Tried to review the features but they are too specific. Do you think if we can make up a registration mechanism like register("feature_name", extractor) to better manage them?

python/tvm/auto_scheduler/feature.py

merrymercy · 2020-08-05T02:47:30Z

@comaniac It is not easy to do so. Now we share a lot of computation for different features. If we compute them separately, redundant computation will be introduced. In addition, the global context is also shared for these features. How to register the required global context for a specific feature is also a problem.

src/auto_scheduler/feature.cc

include/tvm/auto_scheduler/feature.h

python/tvm/auto_scheduler/feature.py

junrushao · 2020-08-06T18:24:51Z

python/tvm/auto_scheduler/feature.py

+    # The format for n records is:
+    # {
+    #   int n;
+    #   int[n+2] sizes


nitpick on the doc

Suggested change

# int[n+2] sizes

# int sizes[0]

# ...

# int sizes[n + 1]

Personally, I think junrushao1994's suggestion is clearer than current comment convention.Since I think "float sizes[n + 1]" illustrates the format semantics more precisely than "float[sizes[0]]".

I won't take either of your suggestions. I use int[x] variable to denote an array of x integer values with the name variable.

@junrushao1994 's suggestion is wrong because int size[n] has another meaning of n+1 integers, but here we actually mean nth integer.
@yangjunpro 's suggestion is also wrong. It makes sizes become the name of the filed, but actually size only specifies the size.

junrushao · 2020-08-06T18:46:02Z

python/tvm/auto_scheduler/feature.py

+            n_stmts = struct.unpack_from("f", byte_arr, offset=offset)
+            offset += SIZE_OF_FLOAT
+
+            n_stmts = int(n_stmts[0] + 0.5)


perhaps add a comment here, like "for avoiding rounding error"? btw, why we use a float for integer?

Some of them are int while the others are float. I want to store all of them in a single array, but we do not have union in tvm::Object. So I use a single float array to store both int and float

+1, how about store the float array and the n_stmts separately? By doing this we may need to add extra code, but I think the program semantic is clearer.

python/tvm/auto_scheduler/feature.py

src/auto_scheduler/feature.cc

tqchen · 2020-08-10T15:55:01Z

per discussion with @merrymercy :

split the feature extraction logic into FeatureExtractor(callback by visitor) and visitor.
Provide FeatureExtractor for each group of features

tests/python/unittest/test_auto_scheduler_feature.py

merrymercy · 2020-08-11T20:37:06Z

I organized the features into 5 groups.

  // Group 1: Computation related features
  // Group 2: Buffer access related features (per buffer)
  // Group 3: Arithmetic intensity related features
  // Group 4: Allocation related features
  // Group 5: Outer scope related features

The specification can be found in src/auto_scheduler/feature.cc::FeatureSet.

Each group has one corresponding extraction function, they are called in the main visitor (PerStoreFeatureExtractor).

  void VisitStmt_(const BufferStoreNode* node) final {
    ...
    // Group 1: Computation related features
    ExtractComputationFeature(node, math_op_counter);

    // Group 2: Buffer access related features (per buffer)
    ExtractBufferAccessFeature(node, math_op_counter, &cur_compute_ops, &compute_ops_list,
                               &mem_bytes_list);

    // Group 3: Arithmetic intensity related features
    ExtractArithmeticIntensityFeature(node, cur_compute_ops, compute_ops_list, mem_bytes_list);

    // Group 4: Allocation related features
    ExtractOuterScopeFeature(node);
  }

  void VisitStmt_(const BufferRealizeNode* node) final {
    StmtExprVisitor::VisitStmt_(node);

    // Group 5: Outer scope related features
    ExtractAllocationFeature(node);
  }

I think the code is very clean and much better than the old autotvm now.
I don't like registration or adding an extra layer of callback. It is over design and make things more complicated.

@tqchen @comaniac @FrozenGene @jroesch @junrushao1994 Your comments are all addressed. This PR is ready to be merged.

comaniac

Much cleaner now. LGTM

tqchen · 2020-08-11T22:55:54Z

@merrymercy please fix the conflict

python/tvm/auto_scheduler/feature.py

yangjunpro

Just some nitpicking comments.

include/tvm/auto_scheduler/feature.h

yangjunpro · 2020-08-12T00:45:16Z

python/tvm/auto_scheduler/feature.py

+    # The format for n records is:
+    # {
+    #   int n;
+    #   int[n+2] sizes


Personally, I think junrushao1994's suggestion is clearer than current comment convention.Since I think "float sizes[n + 1]" illustrates the format semantics more precisely than "float[sizes[0]]".

yangjunpro · 2020-08-12T00:48:17Z

python/tvm/auto_scheduler/feature.py

+            n_stmts = struct.unpack_from("f", byte_arr, offset=offset)
+            offset += SIZE_OF_FLOAT
+
+            n_stmts = int(n_stmts[0] + 0.5)


+1, how about store the float array and the n_stmts separately? By doing this we may need to add extra code, but I think the program semantic is clearer.

…ache#6190) * [AutoScheduler] add feature extraction * fix lint * fix gpu test * address comments * improve flop estimation * rebase * refactor with group * fix * Apply suggestions from code review

merrymercy changed the title ~~[Ansor][AutoTVM v2.0] Phase 1 feature extraction for xgboost model~~ [Ansor][AutoTVM v2.0] Phase 1: feature extraction for cost models Aug 2, 2020

merrymercy force-pushed the pr-feature-extraction branch 5 times, most recently from 0ace7c7 to 710ef76 Compare August 3, 2020 21:45

comaniac requested changes Aug 4, 2020

View reviewed changes

jcf94 reviewed Aug 4, 2020

View reviewed changes

include/tvm/auto_scheduler/feature.h Show resolved Hide resolved

jcf94 reviewed Aug 4, 2020

View reviewed changes

src/auto_scheduler/feature.cc Outdated Show resolved Hide resolved

jcf94 reviewed Aug 4, 2020

View reviewed changes

src/auto_scheduler/feature.cc Outdated Show resolved Hide resolved

jcf94 reviewed Aug 4, 2020

View reviewed changes

src/auto_scheduler/feature.cc Outdated Show resolved Hide resolved

merrymercy force-pushed the pr-feature-extraction branch from 55c4538 to 9105a96 Compare August 4, 2020 14:31

comaniac reviewed Aug 5, 2020

View reviewed changes

python/tvm/auto_scheduler/feature.py Outdated Show resolved Hide resolved

tqchen requested changes Aug 5, 2020

View reviewed changes

src/auto_scheduler/feature.cc Outdated Show resolved Hide resolved

junrushao reviewed Aug 6, 2020

View reviewed changes

include/tvm/auto_scheduler/feature.h Outdated Show resolved Hide resolved

junrushao reviewed Aug 6, 2020

View reviewed changes

tqchen self-assigned this Aug 10, 2020

tqchen added status: need review status: need update need update based on feedbacks labels Aug 10, 2020

merrymercy added 6 commits August 10, 2020 23:30

[AutoScheduler] add feature extraction

b7f3365

fix lint

cf99010

fix gpu test

d9c9bd1

address comments

3ee17d9

improve flop estimation

522c466

rebase

782c542

merrymercy force-pushed the pr-feature-extraction branch from 9105a96 to 782c542 Compare August 11, 2020 06:30

jwfromm reviewed Aug 11, 2020

View reviewed changes

tests/python/unittest/test_auto_scheduler_feature.py Show resolved Hide resolved

merrymercy added 2 commits August 11, 2020 13:29

refactor with group

689602c

fix

c8cdf7a

comaniac approved these changes Aug 11, 2020

View reviewed changes

Merge branch 'master' into pr-feature-extraction

5212caf

merrymercy commented Aug 11, 2020

View reviewed changes

python/tvm/auto_scheduler/feature.py Outdated Show resolved Hide resolved

merrymercy commented Aug 11, 2020

View reviewed changes

python/tvm/auto_scheduler/feature.py Outdated Show resolved Hide resolved

Apply suggestions from code review

42537c6

yangjunpro reviewed Aug 12, 2020

View reviewed changes

merrymercy requested a review from tqchen August 12, 2020 11:23

tqchen approved these changes Aug 12, 2020

View reviewed changes

merrymercy merged commit 3565889 into apache:master Aug 12, 2020

tqchen added status: accepted and removed status: need review status: need update need update based on feedbacks labels Aug 12, 2020

merrymercy mentioned this pull request Aug 13, 2020

[Ansor][AutoTVM v2.0] Phase 1: XGBoost Cost Model #6270

Merged

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

merrymercy deleted the pr-feature-extraction branch September 27, 2020 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ansor][AutoTVM v2.0] Phase 1: feature extraction for cost models #6190

[Ansor][AutoTVM v2.0] Phase 1: feature extraction for cost models #6190

merrymercy commented Aug 2, 2020 •

edited

Loading

merrymercy commented Aug 2, 2020

comaniac left a comment

comaniac left a comment

merrymercy commented Aug 5, 2020

junrushao Aug 6, 2020

yangjunpro Aug 12, 2020

merrymercy Aug 12, 2020 •

edited

Loading

junrushao Aug 6, 2020

merrymercy Aug 11, 2020 •

edited

Loading

yangjunpro Aug 12, 2020

tqchen commented Aug 10, 2020

merrymercy commented Aug 11, 2020 •

edited

Loading

comaniac left a comment

tqchen commented Aug 11, 2020

yangjunpro left a comment

yangjunpro Aug 12, 2020

yangjunpro Aug 12, 2020

-    #   int[n+2] sizes
+    #   int sizes[0]
+    #   ...
+    #   int sizes[n + 1]

[Ansor][AutoTVM v2.0] Phase 1: feature extraction for cost models #6190

[Ansor][AutoTVM v2.0] Phase 1: feature extraction for cost models #6190

Conversation

merrymercy commented Aug 2, 2020 • edited Loading

merrymercy commented Aug 2, 2020

comaniac left a comment

Choose a reason for hiding this comment

comaniac left a comment

Choose a reason for hiding this comment

merrymercy commented Aug 5, 2020

junrushao Aug 6, 2020

Choose a reason for hiding this comment

yangjunpro Aug 12, 2020

Choose a reason for hiding this comment

merrymercy Aug 12, 2020 • edited Loading

Choose a reason for hiding this comment

junrushao Aug 6, 2020

Choose a reason for hiding this comment

merrymercy Aug 11, 2020 • edited Loading

Choose a reason for hiding this comment

yangjunpro Aug 12, 2020

Choose a reason for hiding this comment

tqchen commented Aug 10, 2020

merrymercy commented Aug 11, 2020 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

tqchen commented Aug 11, 2020

yangjunpro left a comment

Choose a reason for hiding this comment

yangjunpro Aug 12, 2020

Choose a reason for hiding this comment

yangjunpro Aug 12, 2020

Choose a reason for hiding this comment

merrymercy commented Aug 2, 2020 •

edited

Loading

merrymercy Aug 12, 2020 •

edited

Loading

merrymercy Aug 11, 2020 •

edited

Loading

merrymercy commented Aug 11, 2020 •

edited

Loading