Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoScheduler] Guarantee init population sampling outputs a valid set #6713

Merged
merged 11 commits into from
Oct 23, 2020

Conversation

comaniac
Copy link
Contributor

@comaniac comaniac commented Oct 19, 2020

The previous implementation of initial population sampling treats a state as valid as long as we didn't encounter any problem when applying the initial population rules. However, it's possible that all states outputted by this phase cannot successfully either be lowered, extract features, or pass the GPU code verification (due to invalid thread number or memory usage). In this case, the evo search is trapped in a set of invalid states and it's inefficient for it to find a valid point.

In this PR, we improved the initial population sampling to perform the same process as we did in the evo search. Specifically, all states have to be lowered and estimated to make sure they are valid. Note that this will also increase the time of sampling initial population. Here is an example log of tuning a forward conv2d (last layer in ResNet) on Nvidia T4 for 64 trials:

------------------------------------------------------------
-------------------------  [ Search ]
------------------------------------------------------------
Generate Sketches               #s: 1
Encountered 14 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 6 errors during feature extraction, which are safely ignored.
Encountered 9 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Sample Iter: 5  #Pop: 7 #Target: 50     fail_ct: 243    Time elapsed: 0.24
Encountered 3 errors during feature extraction, which are safely ignored.
Encountered 5 errors during feature extraction, which are safely ignored.
Encountered 12 errors during feature extraction, which are safely ignored.
Encountered 11 errors during feature extraction, which are safely ignored.
Encountered 5 errors during feature extraction, which are safely ignored.
Sample Iter: 10 #Pop: 11        #Target: 50     fail_ct: 489    Time elapsed: 0.45
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Sample Iter: 15 #Pop: 18        #Target: 50     fail_ct: 732    Time elapsed: 0.68
Encountered 9 errors during feature extraction, which are safely ignored.
Encountered 8 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Sample Iter: 20 #Pop: 26        #Target: 50     fail_ct: 974    Time elapsed: 0.88
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Sample Iter: 25 #Pop: 35        #Target: 50     fail_ct: 1215   Time elapsed: 1.11
Encountered 4 errors during feature extraction, which are safely ignored.
Encountered 10 errors during feature extraction, which are safely ignored.
Encountered 5 errors during feature extraction, which are safely ignored.
Encountered 11 errors during feature extraction, which are safely ignored.
Encountered 13 errors during feature extraction, which are safely ignored.
Sample Iter: 30 #Pop: 45        #Target: 50     fail_ct: 1455   Time elapsed: 1.34
Encountered 4 errors during feature extraction, which are safely ignored.
Encountered 7 errors during feature extraction, which are safely ignored.
Encountered 13 errors during feature extraction, which are safely ignored.
Encountered 9 errors during feature extraction, which are safely ignored.
Sample Initial Population       #s: 51  fail_ct: 1649   Time elapsed: 1.52
/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/_methods.py:34: RuntimeWarning: invalid value encountered in reduce
  return umr_minimum(a, axis, None, out, keepdims, initial, where)
/home/ubuntu/.local/lib/python3.6/site-packages/numpy/core/_methods.py:30: RuntimeWarning: invalid value encountered in reduce
  return umr_maximum(a, axis, None, out, keepdims, initial, where)
GA Iter: 0      Max score: 0.9969       Min score: 0.0391       #Pop: 51        #M+: 0  #M-: 0
Encountered 359 errors during feature extraction, which are safely ignored.
Encountered 369 errors during feature extraction, which are safely ignored.
Encountered 328 errors during feature extraction, which are safely ignored.
Encountered 299 errors during feature extraction, which are safely ignored.
Encountered 319 errors during feature extraction, which are safely ignored.
GA Iter: 5      Max score: 0.9999       Min score: 0.9839       #Pop: 2048      #M+: 1462       #M-: 0
Encountered 288 errors during feature extraction, which are safely ignored.
Encountered 292 errors during feature extraction, which are safely ignored.
Encountered 308 errors during feature extraction, which are safely ignored.
Encountered 258 errors during feature extraction, which are safely ignored.
Encountered 284 errors during feature extraction, which are safely ignored.
GA Iter: 10     Max score: 1.0000       Min score: 0.9932       #Pop: 2048      #M+: 1582       #M-: 0
EvolutionarySearch              #s: 128 Time elapsed: 29.74
------------------------------------------------------------
-------------------------  [ Measure ]
------------------------------------------------------------
Get 64 programs for measure. (This may take a while)
................................****************************T****
# skip logs. Max throughput 3317.14 GFlop/s
................................*E******************E*********E****
# skip logs. Max throughput 3317.14 GFlop/s
Median execution time: 2.051 ms

In addition, this PR also separates the size of sampling initial population from the size of population in evolutinary search as we don't need too many random sampled candidates. The size of initial population is set to 50 by default.

cc @merrymercy @jcf94 @FrozenGene

@comaniac comaniac requested a review from merrymercy October 19, 2020 23:54
@merrymercy merrymercy self-assigned this Oct 20, 2020
@comaniac
Copy link
Contributor Author

The potential CI issue is also resolved.
@merrymercy PTAL.

@tqchen tqchen merged commit 7158a4b into apache:main Oct 23, 2020
@comaniac comaniac deleted the ansor_init_pop branch October 23, 2020 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants