[AutoScheduler] Guarantee init population sampling outputs a valid set #6713

comaniac · 2020-10-19T23:54:44Z

The previous implementation of initial population sampling treats a state as valid as long as we didn't encounter any problem when applying the initial population rules. However, it's possible that all states outputted by this phase cannot successfully either be lowered, extract features, or pass the GPU code verification (due to invalid thread number or memory usage). In this case, the evo search is trapped in a set of invalid states and it's inefficient for it to find a valid point.

In this PR, we improved the initial population sampling to perform the same process as we did in the evo search. Specifically, all states have to be lowered and estimated to make sure they are valid. Note that this will also increase the time of sampling initial population. Here is an example log of tuning a forward conv2d (last layer in ResNet) on Nvidia T4 for 64 trials:

------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Generate Sketches               #s: 1
Encountered 14 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 6 errors during feature extraction, which are safely ignored.
Encountered 9 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Sample Iter: 5  #Pop: 7 #Target: 50     fail_ct: 243    Time elapsed: 0.24
Encountered 3 errors during feature extraction, which are safely ignored.
Encountered 5 errors during feature extraction, which are safely ignored.
Encountered 12 errors during feature extraction, which are safely ignored.
Encountered 11 errors during feature extraction, which are safely ignored.
Encountered 5 errors during feature extraction, which are safely ignored.
Sample Iter: 10 #Pop: 11        #Target: 50     fail_ct: 489    Time elapsed: 0.45
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Sample Iter: 15 #Pop: 18        #Target: 50     fail_ct: 732    Time elapsed: 0.68
Encountered 9 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Sample Iter: 20 #Pop: 26        #Target: 50     fail_ct: 974    Time elapsed: 0.88
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Sample Iter: 25 #Pop: 35        #Target: 50     fail_ct: 1215   Time elapsed: 1.11
Encountered 4 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 5 errors during feature extraction, which are safely ignored.
Encountered 11 errors during feature extraction, which are safely ignored.
Encountered 13 errors during feature extraction, which are safely ignored.
Sample Iter: 30 #Pop: 45        #Target: 50     fail_ct: 1455   Time elapsed: 1.34
Encountered 4 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 13 errors during feature extraction, which are safely ignored.
Encountered 9 errors during feature extraction, which are safely ignored.
Sample Initial Population       #s: 51  fail_ct: 1649   Time elapsed: 1.52
/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/_methods.py:34: RuntimeWarning: invalid value encountered in reduce
  return umr_minimum(a, axis, None, out, keepdims, initial, where)
/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/_methods.py:30: RuntimeWarning: invalid value encountered in reduce
  return umr_maximum(a, axis, None, out, keepdims, initial, where)
GA Iter: 0      Max score: 0.9969       Min score: 0.0391       #Pop: 51        #M+: 0  #M-: 0
Encountered 359 errors during feature extraction, which are safely ignored.
Encountered 369 errors during feature extraction, which are safely ignored.
Encountered 328 errors during feature extraction, which are safely ignored.
Encountered 299 errors during feature extraction, which are safely ignored.
Encountered 319 errors during feature extraction, which are safely ignored.
GA Iter: 5      Max score: 0.9999       Min score: 0.9839       #Pop: 2048      #M+: 1462       #M-: 0
Encountered 288 errors during feature extraction, which are safely ignored.
Encountered 292 errors during feature extraction, which are safely ignored.
Encountered 308 errors during feature extraction, which are safely ignored.
Encountered 258 errors during feature extraction, which are safely ignored.
Encountered 284 errors during feature extraction, which are safely ignored.
GA Iter: 10     Max score: 1.0000       Min score: 0.9932       #Pop: 2048      #M+: 1582       #M-: 0
EvolutionarySearch              #s: 128 Time elapsed: 29.74
------------------------------------------------------------
-------------------------  [ Measure ]
------------------------------------------------------------
Get 64 programs for measure. (This may take a while)
................................****************************T****
# skip logs. Max throughput 3317.14 GFlop/s
................................*E******************E*********E****
# skip logs. Max throughput 3317.14 GFlop/s
Median execution time: 2.051 ms

In addition, this PR also separates the size of sampling initial population from the size of population in evolutinary search as we don't need too many random sampled candidates. The size of initial population is set to 50 by default.

cc @merrymercy @jcf94 @FrozenGene

comaniac · 2020-10-23T16:35:14Z

The potential CI issue is also resolved.
@merrymercy PTAL.

apache#6713)

comaniac requested a review from merrymercy October 19, 2020 23:54

merrymercy self-assigned this Oct 20, 2020

comaniac added 7 commits October 21, 2020 21:20

maintain valid states in init pop

6f3ac49

miner fix

0ff6f6c

fix format

303e572

comment

988dd6a

reset

dbc603f

fix

9f2782e

decouple parameters

e4e3398

comaniac force-pushed the ansor_init_pop branch from 6222568 to e4e3398 Compare October 21, 2020 21:32

ZihengJiang added the status: need review label Oct 21, 2020

comaniac added 4 commits October 21, 2020 23:04

format

f45c408

Merge branch 'main' into ansor_init_pop

448c58a

fix

799c0b6

add artifact

5aac618

tqchen approved these changes Oct 23, 2020

View reviewed changes

tqchen merged commit 7158a4b into apache:main Oct 23, 2020

comaniac deleted the ansor_init_pop branch October 23, 2020 17:53

masahi pushed a commit to masahi/tvm that referenced this pull request Oct 23, 2020

[AutoScheduler] Guarantee init population sampling outputs a valid set (

1e7964e

apache#6713)

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Oct 29, 2020

[AutoScheduler] Guarantee init population sampling outputs a valid set (

ebbe1ea

apache#6713)

comaniac added status: accepted and removed status: need review labels Nov 2, 2020

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020

[AutoScheduler] Guarantee init population sampling outputs a valid set (

4b534b8

apache#6713)

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020

[AutoScheduler] Guarantee init population sampling outputs a valid set (

cc21fea

apache#6713)

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020

[AutoScheduler] Guarantee init population sampling outputs a valid set (

e882aaf

apache#6713)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoScheduler] Guarantee init population sampling outputs a valid set #6713

[AutoScheduler] Guarantee init population sampling outputs a valid set #6713

comaniac commented Oct 19, 2020 •

edited

Loading

comaniac commented Oct 23, 2020

[AutoScheduler] Guarantee init population sampling outputs a valid set #6713

[AutoScheduler] Guarantee init population sampling outputs a valid set #6713

Conversation

comaniac commented Oct 19, 2020 • edited Loading

comaniac commented Oct 23, 2020

comaniac commented Oct 19, 2020 •

edited

Loading