[MetaSchedule] Adding post optimization in MetaSchedule to Improve Scheduling #17104
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request aims to enhance model optimization by adding post optimization in MetaSchedule. The proposed approach involves the following steps:
By using Droplet Search as a post optimization (Droplet paper), we have been able to reduce the number of trials explored by MetaSchedule while still achieving faster kernel performance. We have observed this improvement on the following architectures: Nvidia A100, Nvidia 3080, AMD x86, and ARM A64FX. The results can be found in this report: bennu paper
Proposed Changes
Motivation
This pull request introduces an exploitation phase leveraging the coordinate descent algorithm to MetaSchedule. By iteratively refining the best kernel identified by MetaSchedule, we achieve two key benefits:
Thus, this PR optimizes MetaSchedule along two crucial dimensions: search efficiency and kernel performance.
Testing and Validation
Extensive testing has been conducted to validate the efficacy and performance improvements achieved through the integration of MetaSchedule and Droplet Search. Benchmarking tests have been performed across Nvidia A100, AMD x86, and ARM A64FX architectures to assess the impact on kernel speed and search time reduction compared with 10,000 trials from MetaSchedule execution. These results are available in Section 3 of this manuscript: paper
Additional Notes
This pull request builds upon prior research and experimentation in model optimization. The proposed approach improves end-to-end models across diverse hardware platforms while still reducing MetaSchedule's search time. We welcome the community’s feedback, suggestions, and contributions to further refine and enhance these methodologies.
Thank you.
Sincerely,
Michael Canesche, Gaurav Verma, and Fernando Pereira