Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recent torchinductor changes seems to break torchao CI #127924

Closed
jerryzh168 opened this issue Jun 4, 2024 · 1 comment
Closed

recent torchinductor changes seems to break torchao CI #127924

jerryzh168 opened this issue Jun 4, 2024 · 1 comment

Comments

@jerryzh168
Copy link
Contributor

jerryzh168 commented Jun 4, 2024

🐛 Describe the bug

see: https://github.com/pytorch/ao/actions/runs/9367512416/job/25795245248

and see pytorch/ao#301 (only the pytorch nightly test fails)

Repro:

  1. install most recent torch nightly version
  2. (with https://github.com/pytorch/ao/tree/main) test/integration/test_integration.py -k test_autoquant_one_input_21_cuda

Versions

pytorch nightly

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @bdhirsh @anijain2305 @chauhang

@desertfire
Copy link
Contributor

I verified it reproduces with pytorch commit e216df4 .

cpuhrsch added a commit to pytorch/ao that referenced this issue Jun 4, 2024
jerryzh168 pushed a commit to pytorch/ao that referenced this issue Jun 4, 2024
@soulitzer soulitzer added high priority module: regression It used to work, and now it doesn't labels Jun 4, 2024
pytorchmergebot added a commit that referenced this issue Jun 4, 2024
pytorchmergebot referenced this issue Jun 4, 2024
For mixed mm with small sizes of m, such as in the example provided in #127056, being able to set BLOCK_M to 16 leads to better performance. This PR introduces kernel configs that are specific to mixed mm by extending the mm configs with two configs that work well for the example provided in #127056.
I am excluding configs with (BLOCK_M=16, BLOCK_K=16, BLOCK_N=64) because triton crashes when this config is used.

For the example in #127056:
- Without my changes, skip_triton is evaluated to true which disables autotuning. On my machine I achieve 146GB/s.
- If autotuning is enabled, but BLOCK_M>=32, I achieve 614 GB/s.
- With the changes in this PR (i.e. autotuning enabled and BLOCK_M=16), I achieve 772 GB/s.

Pull Request resolved: #127663
Approved by: https://github.com/Chillee
dbyoung18 pushed a commit to dbyoung18/ao that referenced this issue Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants