recent torchinductor changes seems to break torchao CI #127924

jerryzh168 · 2024-06-04T16:01:15Z

🐛 Describe the bug

see: https://github.com/pytorch/ao/actions/runs/9367512416/job/25795245248

and see pytorch/ao#301 (only the pytorch nightly test fails)

Repro:

install most recent torch nightly version
(with https://github.com/pytorch/ao/tree/main) test/integration/test_integration.py -k test_autoquant_one_input_21_cuda

Versions

pytorch nightly

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @bdhirsh @anijain2305 @chauhang

desertfire · 2024-06-04T16:03:47Z

I verified it reproduces with pytorch commit e216df4 .

Mitigates pytorch/pytorch#127924

…7663)" This reverts commit d8d0bf2. Reverted #127663 on behalf of https://github.com/soulitzer due to breaks torch ao CI, see: #127924 ([comment](#127663 (comment)))

For mixed mm with small sizes of m, such as in the example provided in #127056, being able to set BLOCK_M to 16 leads to better performance. This PR introduces kernel configs that are specific to mixed mm by extending the mm configs with two configs that work well for the example provided in #127056. I am excluding configs with (BLOCK_M=16, BLOCK_K=16, BLOCK_N=64) because triton crashes when this config is used. For the example in #127056: - Without my changes, skip_triton is evaluated to true which disables autotuning. On my machine I achieve 146GB/s. - If autotuning is enabled, but BLOCK_M>=32, I achieve 614 GB/s. - With the changes in this PR (i.e. autotuning enabled and BLOCK_M=16), I achieve 772 GB/s. Pull Request resolved: #127663 Approved by: https://github.com/Chillee

Mitigates pytorch/pytorch#127924

jerryzh168 added the oncall: pt2 label Jun 4, 2024

cpuhrsch added a commit to pytorch/ao that referenced this issue Jun 4, 2024

Pin CUDA nightly to mitigate regression

b365c32

Mitigates pytorch/pytorch#127924

cpuhrsch mentioned this issue Jun 4, 2024

Pin CUDA nightly to mitigate regression pytorch/ao#322

Merged

jerryzh168 pushed a commit to pytorch/ao that referenced this issue Jun 4, 2024

Pin CUDA nightly to mitigate regression (#322)

e2196fd

Mitigates pytorch/pytorch#127924

soulitzer added high priority module: regression It used to work, and now it doesn't labels Jun 4, 2024

pytorch-bot bot added the triage review label Jun 4, 2024

soulitzer mentioned this issue Jun 4, 2024

Inductor: Allow small sizes of m for mixed mm autotuning #127663

Closed

soulitzer closed this as completed Jun 11, 2024

dbyoung18 pushed a commit to dbyoung18/ao that referenced this issue Jul 31, 2024

Pin CUDA nightly to mitigate regression (pytorch#322)

0669d1d

Mitigates pytorch/pytorch#127924

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recent torchinductor changes seems to break torchao CI #127924

recent torchinductor changes seems to break torchao CI #127924

jerryzh168 commented Jun 4, 2024 •

edited by pytorch-bot bot

Loading

desertfire commented Jun 4, 2024

recent torchinductor changes seems to break torchao CI #127924

recent torchinductor changes seems to break torchao CI #127924

Comments

jerryzh168 commented Jun 4, 2024 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

desertfire commented Jun 4, 2024

jerryzh168 commented Jun 4, 2024 •

edited by pytorch-bot bot

Loading