adding default inductor config settings #423

HDCharles · 2024-06-22T18:30:44Z

Summary:

making autoquant and quantize (and eval and generate) apis call a new
recommended_inductor_config_setter util to set recommended apis

also update groupsize -> group_size in generate.py
and handled errors in autoquant (to pass CI)

Test Plan:

sh benchmarks.sh

comparison of different config combinations for matmul precision,
mixed_mm and coordinate_descent

high precision

tok/s= 9.14, mem/s= 60.55 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf,
tok/s=147.02, mem/s= 973.53 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf,

medium precision

tok/s= 9.23, mem/s= 61.11 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf,
tok/s=139.59, mem/s= 924.33 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf,

high + mixed_mm_choice heuristic

tok/s= 9.10, mem/s= 60.26 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf,
tok/s=146.98, mem/s= 973.23 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf,

high + false use_mixed_mm

tok/s= 9.28, mem/s= 61.48 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf,
tok/s=146.90, mem/s= 972.73 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf,

high + default mixed_mm_choice

tok/s= 9.08, mem/s= 60.09 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf,
tok/s=137.58, mem/s= 911.00 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf,

high + heuristic + coordinate_descent_check_all_directions

tok/s= 9.19, mem/s= 60.87 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf,
tok/s=166.02, mem/s=1099.30 GB/s, peak_mem= 8.97 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf,

high + false use_mixed_mm + coordinate_descent_check_all_directions

tok/s= 9.28, mem/s= 61.46 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf,
tok/s=161.66, mem/s=1070.43 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf,

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-06-22T18:30:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/423

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d105072 with merge base 96d49cd ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

msaroufim

Nice! Mind also updating any relevant documentation pages? Also if some of those flags are GPU specific just gate those behind a cuda flag

torchao/quantization/utils.py

msaroufim

Cool LGTM

HDCharles · 2024-06-25T03:14:18Z

Nice! Mind also updating any relevant documentation pages? Also if some of those flags are GPU specific just gate those behind a cuda flag

i don't think anything is gpu specific

Summary: making autoquant and quantize apis call a new recommended_inductor_config_setter util to set recommended apis also update groupsize -> groupsize in generate.py Test Plan: sh benchmarks.sh comparison of different config combinations for matmul precision, mixed_mm and coordinate_descent tok/s= 9.14, mem/s= 60.55 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=147.02, mem/s= 973.53 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.23, mem/s= 61.11 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=139.59, mem/s= 924.33 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.10, mem/s= 60.26 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=146.98, mem/s= 973.23 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.28, mem/s= 61.48 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=146.90, mem/s= 972.73 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.08, mem/s= 60.09 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=137.58, mem/s= 911.00 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.19, mem/s= 60.87 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=166.02, mem/s=1099.30 GB/s, peak_mem= 8.97 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-06-25T20:37:02Z

test/integration/test_integration.py

@@ -689,6 +696,7 @@ def test_int8_dynamic_quant_subclass(self, device, dtype):

    @parameterized.expand(COMMON_DEVICE_DTYPE)
    def test_int8_weight_only_quant_subclass(self, device, dtype):
+        undo_recommended_configs()


why do we need these?

a bunch of resource usage errors

https://github.com/pytorch/ao/actions/runs/9657170605/job/26635939215

* adding default inductor config settings Summary: making autoquant and quantize apis call a new recommended_inductor_config_setter util to set recommended apis also update groupsize -> groupsize in generate.py Test Plan: sh benchmarks.sh comparison of different config combinations for matmul precision, mixed_mm and coordinate_descent tok/s= 9.14, mem/s= 60.55 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=147.02, mem/s= 973.53 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.23, mem/s= 61.11 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=139.59, mem/s= 924.33 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.10, mem/s= 60.26 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=146.98, mem/s= 973.23 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.28, mem/s= 61.48 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=146.90, mem/s= 972.73 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.08, mem/s= 60.09 GB/s, peak_mem= 8.33 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=137.58, mem/s= 911.00 GB/s, peak_mem= 8.95 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, tok/s= 9.19, mem/s= 60.87 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, tok/s=166.02, mem/s=1099.30 GB/s, peak_mem= 8.97 GB, model_size= 6.62 GB quant: int8wo, mod: Llama-2-7b-chat-hf, Reviewers: Subscribers: Tasks: Tags: * fixing tests Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fix weight only failures Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fixing new broken test Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * fixing autoquant test Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * testing if inductor config is the issue Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * are inductor configs somehow being set? Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * when is coordinate descent tuning beinng enabled? Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * reset inductor config for tests Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * more test fixes Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * adding warning Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * handling of errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * option to supress autoquant errors Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles requested a review from msaroufim June 22, 2024 18:30

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 22, 2024

HDCharles requested a review from jerryzh168 June 22, 2024 18:30

msaroufim requested changes Jun 22, 2024

View reviewed changes

torchao/quantization/utils.py Show resolved Hide resolved

torchao/quantization/utils.py Show resolved Hide resolved

msaroufim approved these changes Jun 24, 2024

View reviewed changes

HDCharles force-pushed the 066_set_inductor_config branch 2 times, most recently from 5ab0b95 to bfe2ea2 Compare June 25, 2024 03:13

HDCharles force-pushed the 066_set_inductor_config branch from bfe2ea2 to 10a5c4a Compare June 25, 2024 03:17

HDCharles force-pushed the 066_set_inductor_config branch from 10a5c4a to 0e5fc3e Compare June 25, 2024 05:08

HDCharles added 12 commits June 24, 2024 22:51

fixing tests

1421de8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fix weight only failures

c9932ce

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fixing new broken test

18cd1aa

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fixing autoquant test

ee5183f

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

testing if inductor config is the issue

d0e822b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

are inductor configs somehow being set?

a5654f3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

when is coordinate descent tuning beinng enabled?

26d9110

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

reset inductor config for tests

762ef41

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

more test fixes

b0c4e23

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

adding warning

8fadc39

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

handling of errors

3c2825e

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

option to supress autoquant errors

d105072

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 reviewed Jun 25, 2024

View reviewed changes

HDCharles merged commit 211b6bc into main Jun 25, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding default inductor config settings #423

adding default inductor config settings #423

HDCharles commented Jun 22, 2024 •

edited

Loading

pytorch-bot bot commented Jun 22, 2024 •

edited

Loading

msaroufim left a comment

msaroufim left a comment

HDCharles commented Jun 25, 2024

jerryzh168 Jun 25, 2024

HDCharles Jun 25, 2024

adding default inductor config settings #423

adding default inductor config settings #423

Conversation

HDCharles commented Jun 22, 2024 • edited Loading

high precision

medium precision

high + mixed_mm_choice heuristic

high + false use_mixed_mm

high + default mixed_mm_choice

high + heuristic + coordinate_descent_check_all_directions

high + false use_mixed_mm + coordinate_descent_check_all_directions

pytorch-bot bot commented Jun 22, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/423

✅ No Failures

msaroufim left a comment

Choose a reason for hiding this comment

msaroufim left a comment

Choose a reason for hiding this comment

HDCharles commented Jun 25, 2024

jerryzh168 Jun 25, 2024

Choose a reason for hiding this comment

HDCharles Jun 25, 2024

Choose a reason for hiding this comment

HDCharles commented Jun 22, 2024 •

edited

Loading

pytorch-bot bot commented Jun 22, 2024 •

edited

Loading