-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoScheduler] Improve tuning with random cost model #6835
Conversation
@jcf94 please take a look at the failed test. It is because all measured schedules are invalid, and the root cause is that this PR changes the GA iteration to 3 when random cost model is used. It means this test is potentially flaky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to re-trigger CI?
The problem is that after this PR, that unit test will only try very few candidates, and it will become flaky if all candidates are invalid (e.g., PTX error). |
7a95727
to
f2666f7
Compare
@merrymercy @jcf94 I increased the measure trial from 2 to 10 to reduce the flaky possibility (please see the latest commit). |
* fix * more fix * fix * revert * format * Update sketch_policy.cc * increase measure trial to avoid flaky
* fix * more fix * fix * revert * format * Update sketch_policy.cc * increase measure trial to avoid flaky
* fix * more fix * fix * revert * format * Update sketch_policy.cc * increase measure trial to avoid flaky
When tuning a task with all invective operators on GPU, Ansor inlines all operators for better performance. However, this makes the expression complex and causes high overhead of lowering the TE schedule, which is required for cost model feature extraction. Since the tuning space of these tasks is relatively small, it is sufficient to use random cost model so that we can avoid the lowering overhead.
However, the flow of InitPopulation -> RandomStates only gave me one state. After diving into details, I found that all states generated by initial populations are the same in terms of state.ToStr(). The reasons I can think of are either 2,048 is insufficient to produce two different states, or ToStr() does not reflect different states. Here is an example output of initial population sampling with de-duplication (throw away the states if it is already in the out_states):
As can be seen, even 50K samples cannot even produce the second state. I added a set to check the number of unique states before and after infer bound, and here is the results:
This log implies two points:
We should call ToStr() after infer bound to make sure we can differentiate states. As we can see from the log, 2,048 candidates have the same ToStr() before infer bound, but we got 7 different ToStr() after infer bound.
Systematically mutate tile sizes is way more efficient than random sampling. As can be seen from the log, we can get 7 different states in only 2,048 mutated states, but we only get 1 state in 43,007 random states.
Accordingly, this PR runs evolutionary search even we are using the random cost model and here is the new log (initial poulation is set to 1 and retry is set to 2):
cc @merrymercy @jcf94