int4 fixes and improvements #804

HDCharles · 2024-09-04T06:52:38Z

Summary:

added int4 to autoquant using hqq by default
fixes to hqq in normal int4 class so it can actually be used with normal UX
adding hqq to eval/generate
eval hqq to make sure its a reasonable default for autoquant
running llama3 eval now that llama3 is working correctly (fixed in 3.1 PR)
testing hqq v GPTQ so we have a comparison in our benchmarks/eval
GPTQ was broken -> fixes to utils and GPTQ to fix broken code

Test Plan:
benchmarks.sh (new autoquant-int4 benchmarks)

export CHECKPOINT_PATH=../../../checkpoints
export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8wo python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8dq --compile python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-hqq python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-gptq export MODEL_REPO=meta-llama/Meta-Llama-3-8B
python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8wo python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8dq --compile python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-hqq python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-gptq

(see results in README.md)

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-09-04T06:52:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/804

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dc1a07d with merge base f5703b0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: 1) added int4 to autoquant using hqq by default 2) fixes to hqq in normal int4 class so it can actually be used with normal UX 3) adding hqq to eval/generate 3) eval hqq to make sure its a reasonable default for autoquant 4) running llama3 eval now that llama3 is working correctly (fixed in 3.1 PR) 5) testing hqq v GPTQ so we have a comparison in our benchmarks/eval 6) GPTQ was broken -> fixes to utils and GPTQ to fix broken code Test Plan: benchmarks.sh (new autoquant-int4 benchmarks) export CHECKPOINT_PATH=../../../checkpoints export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8wo python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8dq --compile python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-hqq python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-gptq export MODEL_REPO=meta-llama/Meta-Llama-3-8B python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8wo python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8dq --compile python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-hqq python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-gptq (see results in README.md) Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-09-04T16:09:39Z

torchao/quantization/quant_api.py

@@ -393,7 +393,7 @@ def insert_subclass(lin):
    return insert_subclass


-def int4_weight_only(group_size=128, layout_type=TensorCoreTiledLayoutType(inner_k_tiles=8)):
+def int4_weight_only(group_size=128, layout_type=TensorCoreTiledLayoutType(inner_k_tiles=8), use_hqq=False):


for this one I'm planning to have a separate hqq function that can work with all dtypes actually

thats fine but the way it was setup to work made no sense before so this is a strict improvement.

README.md

torchao/quantization/autoquant.py

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-09-05T00:44:02Z

torchao/_models/llama/eval.py

@@ -68,12 +68,17 @@ def run_evaluation(
            quantize_(model, int8_weight_only())


btw, does eval work for you? I'm still getting the same result, e.g. for bfloat16 and int8wo, haven't tried other quantization types yet

yeah all runs without issue
what error do you see? there might have been an update to lm_eval we need to addressw

no errors, just getting the exact same eval number for bfloat16 and int8wo, I'll run more quant types tomorrow

torchao/dtypes/fpx/fpx.py

jerryzh168 · 2024-09-05T01:56:04Z

torchao/quantization/utils.py

@@ -357,7 +357,7 @@ def groupwise_affine_quantize_tensor_from_qparams(
    quant_max = 2 ** n_bit - 1

    int_data = quantize_affine(w, block_size, scales, zeros, output_dtype, quant_min, quant_max, zero_point_domain = ZeroPointDomain.FLOAT)
-    if TORCH_VERSION_AT_LEAST_2_5:
+    if TORCH_VERSION_AT_LEAST_2_5 and w.shape[-1] > 1:


will GPTQ be useful for other types of quantization using quantize_affine, e.g. fp8? any ideas to generalize the single column stuff to all variations of quantize_affine?

jerryzh168

LGTM

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 4, 2024

HDCharles requested review from msaroufim and jerryzh168 September 4, 2024 06:53

HDCharles added 3 commits September 4, 2024 06:33

final testing done

e368e31

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

fixing rebase issue

1d35240

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 reviewed Sep 4, 2024

View reviewed changes

msaroufim reviewed Sep 4, 2024

View reviewed changes

README.md Show resolved Hide resolved

torchao/quantization/autoquant.py Show resolved Hide resolved

fixing fpx bug from rebase

f9f191b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles force-pushed the 056_autoquant_int4 branch from 2ac728d to f9f191b Compare September 4, 2024 21:26

HDCharles added 3 commits September 4, 2024 14:43

readme fixes

03d01ad

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

test fixes

8376847

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

final fixes

07643fe

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles requested review from msaroufim and jerryzh168 September 5, 2024 00:30

jerryzh168 reviewed Sep 5, 2024

View reviewed changes

torchao/dtypes/fpx/fpx.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Sep 5, 2024

View reviewed changes

jerryzh168 approved these changes Sep 5, 2024

View reviewed changes

delete instead of comment out

dc1a07d

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

HDCharles merged commit 317392d into main Sep 5, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int4 fixes and improvements #804

int4 fixes and improvements #804

HDCharles commented Sep 4, 2024

pytorch-bot bot commented Sep 4, 2024 •

edited

Loading

jerryzh168 Sep 4, 2024

HDCharles Sep 4, 2024

jerryzh168 Sep 5, 2024

HDCharles Sep 5, 2024

jerryzh168 Sep 5, 2024

jerryzh168 Sep 5, 2024

jerryzh168 left a comment

		@@ -68,12 +68,17 @@ def run_evaluation(
		quantize_(model, int8_weight_only())

int4 fixes and improvements #804

int4 fixes and improvements #804

Conversation

HDCharles commented Sep 4, 2024

pytorch-bot bot commented Sep 4, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/804

✅ No Failures

jerryzh168 Sep 4, 2024

Choose a reason for hiding this comment

HDCharles Sep 4, 2024

Choose a reason for hiding this comment

jerryzh168 Sep 5, 2024

Choose a reason for hiding this comment

HDCharles Sep 5, 2024

Choose a reason for hiding this comment

jerryzh168 Sep 5, 2024

Choose a reason for hiding this comment

jerryzh168 Sep 5, 2024

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Sep 4, 2024 •

edited

Loading