mixed-precision quantization milestone1: naive_intNwo + eval/benchmark framework #531

Hanxian97 · 2024-07-19T21:06:58Z

Summary:
This is a prototype for mixed-precision quantization. It consists of naive implementation of integer 2/3/5/6-bit quantization. Along with the int4wo and int8wo in torchao, it provides an evaluation framework leveraging lm_eval for mixed-precision quantization on Llama3

Test Plan:
To test the naive implementation of quantization APIs: python test/quantization/test_native_intNwo.py

pytorch-bot · 2024-07-19T21:07:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/531

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e516f0b with merge base 00b76c4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

andrewor14

Hi @Hanxian97, thanks for the PR. I think you can remove all the files except for mp_quant_eval.py, naive_intNwo.py, and test_naive_intNwo.py. Left a few minor comments other than that

andrewor14 · 2024-07-22T16:10:15Z

torchao/quantization/prototype/mixed_precision/run_sensi_linear_type.sh

+
+  wait
+done
+


Hi @Hanxian97, I feel we don't want to push these experiments scripts to torchao. Can you remove them from the PR? (OK to keep in your own separate branch for now)

Thanks for the comment. I have removed experiment scripts and only kept mp_quant_eval.py, naive_intNwo.py, and test_naive_intNwo.py.

andrewor14 · 2024-07-22T16:12:55Z

torchao/quantization/prototype/mixed_precision/scripts/naive_intNwo.py

+    ZeroPointDomain,
+)
+
+def intN_weight_only_asym(group_size=32, n=8):


Can you add a short docstring to describe what this is doing? Maybe add an example to use this with the quantize_ API? (same for intN_weight_only_sym)

also, should we limit this to n = [2, 3, 4, 5, 6, 8] for now? (throw error otherwise)

Added the doctoring and assertion to limit [2,3,4,5,6,8] only

andrewor14 · 2024-07-22T16:14:16Z

torchao/quantization/prototype/mixed_precision/scripts/sensitivity_study.py

@@ -0,0 +1,95 @@
+import torch


might wanna call this file something else since you're about to do the real sensitivity analysis

Removed this file for now and will commit the real sensitivity analysis in milestone2

andrewor14 · 2024-07-22T16:15:58Z

torchao/quantization/prototype/mixed_precision/scripts/mp_quant_eval.py

+    model = AutoModelForCausalLM.from_pretrained(repo_id).to(device="cpu", dtype=precision)
+
+    if quantization == "int8dq":
+        quantize_(model.to(device=device), int8_dynamic_activation_int4_weight())


this seems wrong? On main it's int8_dynamic_activation_int8_weight:

ao/scripts/hf_eval.py

Line 52 in 0e6c122

quantize_(model, int8_dynamic_activation_int8_weight())

.

Actually we can probably just delete this case?

removed this for now since we will not use this

andrewor14 · 2024-07-22T16:19:34Z

torchao/quantization/prototype/mixed_precision/scripts/test_naive_intNwo.py

@@ -0,0 +1,27 @@
+import torch


by the way I think we need to move this to torchao/test if we want it to run as part of CI

Put the test_naive_intNwo.py under test/quantization now

torchao/quantization/prototype/mixed_precision/scripts/naive_intNwo.py

jerryzh168 · 2024-07-23T22:03:01Z

torchao/quantization/prototype/mixed_precision/scripts/naive_intNwo.py

+        target_dtype = torch.int8
+        quant_min = 0
+        quant_max = 2**n-1


should target_dtype be torch.uint8 for this?

Fixed this.

jerryzh168 · 2024-07-23T22:04:54Z

torchao/quantization/prototype/mixed_precision/scripts/mp_quant_eval.py

+            if sensi_bit == 8:
+                quantize_(model.to(device=device), int8_weight_only(), filter_fn_sen)
+            elif  sensi_bit == 4: 
+                quantize_(model.to(device=device), int4_weight_only(group_size=group_size), filter_fn_sen)


you could merge these logic into intN_weight_only_asym I think

merged them into intN_weight_only now

jerryzh168 · 2024-07-23T22:07:35Z

torchao/quantization/prototype/mixed_precision/scripts/quant_model_size.py

+bit_zeropoint = 2  # Example value, please adjust as needed
+bit_scale = 2  # Example value, please adjust as needed


are these bytes?

Yes these are in bytes. Have fixed this. Thanks!

jerryzh168 · 2024-07-23T22:10:18Z

torchao/quantization/prototype/mixed_precision/scripts/quant_model_size.py

+    return total_size_gb
+
+# Example usage
+num_elements = 250945664 #number of elements per Llama3 linear layer


can this be calculated from the model instead of hardcoded? also I feel a better integration is just to fix and extend

ao/torchao/utils.py

Line 188 in 5787e9e

def get_model_size_in_bytes(model, ignore_embeddings=False):

Yes this is a temporary solution for Llama3. Thanks for the suggestion! I will try to generalize it by extend the get_model_size_in_bytes.

jerryzh168 · 2024-07-23T22:14:35Z

torchao/quantization/prototype/mixed_precision/scripts/sensitivity_study.py

+torch._inductor.config.force_fuse_int_mm_with_mul = True
+torch._inductor.config.fx_graph_cache = True
+
+def intN_weight_only(group_size=32, n=8):


I'd suggest to name this in more detail, since you have different dtypes and asymmetric/symmetric, in this case it's uintN_asymmetric_weight_only (or probably pass around asymmetric/symmetric as an argument)

I passed asymmetric/symmetric as an argument and merged them into intN_weight_only

jerryzh168 · 2024-07-23T22:15:05Z

torchao/quantization/prototype/mixed_precision/scripts/sensitivity_study.py

+        eps = 1e-6
+        preserve_zero = False
+        zero_point_dtype = torch.bfloat16
+        zero_point_domain = ZeroPointDomain.FLOAT


why is this FLOAT?

Thanks for pointing it out. Just changed it to INT.

andrewor14

Approving to unblock. Thanks!

andrewor14 · 2024-07-24T17:10:11Z

torchao/quantization/prototype/mixed_precision/scripts/naive_intNwo.py

+        eps = 1e-6
+        preserve_zero = False
+        zero_point_dtype = torch.bfloat16
+        zero_point_domain = ZeroPointDomain.FLOAT


I think this should be ZeroPointDomain.INT. FLOAT is mainly for the optimized int4 tinygemm kernel right now

Thanks for pointing it out. Just changed it to INT.

andrewor14 · 2024-07-24T17:10:48Z

test/quantization/test_naive_intNwo.py

@@ -0,0 +1,46 @@
+import torch


maybe call this test_mixed_precision.py to match your prototype folder and for your future test cases as well?

Renamed it.

torchao/quantization/prototype/mixed_precision/scripts/naive_intNwo.py

andrewor14 · 2024-07-25T18:14:01Z

test/quantization/test_mixed_precision.py

+        test_weight_only_quant(i, False)
+        print(f"Test passed for {i}-bit using naive intNwo asymmetric quantization implementation")
+    except Exception as e:
+        print(f"Exception handled in test loop for {i}-bit asymmetric quantization. Details: {e}")


might want to actually raise this exception too? Otherwise it'll be hard to catch the test when it fails

andrewor14 · 2024-07-25T18:47:17Z

test/quantization/test_mixed_precision.py

+import os
+import sys
+# append the path to the naive_intNwo.py file
+sys.path.append(os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))), "torchao/quantization/prototype/mixed_precision/scripts"))


Does it work if you just add an empty __init__.py to torchao/quantization/prototype/mixed_precision? Then you won't need this line anymore?

Added init.py and removed the path append

…ghtly

…k framework (#531) * milestone1: naive_intNwo + eval/benchmark * remove experiment scripts * remove exp files * use default ZeroPointDomain.INT for int2/3/5/6 * renamed test_naive_intNwo.py to test_mixed_precision.py * updated intNwo with _get_linear_subclass_inserter * adjust sqnr threshold according to bit width * fixed test for int4wo and add __init__.py * skip test_aq_int8_weight_only_quant_3_subclass due to seg fault on nightly * edit the sqnr threshold * add unittest * correct import path

Hanxian97 requested review from andrewor14 and jerryzh168 July 19, 2024 21:06

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 19, 2024

Hanxian97 marked this pull request as draft July 21, 2024 23:47

andrewor14 reviewed Jul 22, 2024

View reviewed changes

jerryzh168 reviewed Jul 23, 2024

View reviewed changes

torchao/quantization/prototype/mixed_precision/scripts/naive_intNwo.py Show resolved Hide resolved

jerryzh168 reviewed Jul 23, 2024

View reviewed changes

Hanxian97 force-pushed the Hanxian_MixedPrecision branch from aadda53 to e9f56d4 Compare July 24, 2024 16:04

andrewor14 approved these changes Jul 24, 2024

View reviewed changes

andrewor14 marked this pull request as ready for review July 24, 2024 17:14

andrewor14 reviewed Jul 25, 2024

View reviewed changes

Hanxian97 force-pushed the Hanxian_MixedPrecision branch 4 times, most recently from 75b55c2 to ec36a94 Compare July 30, 2024 16:05

Hanxian97 added 7 commits July 30, 2024 09:41

milestone1: naive_intNwo + eval/benchmark

af83deb

remove experiment scripts

02ef81b

remove exp files

cf2c134

use default ZeroPointDomain.INT for int2/3/5/6

1055f14

renamed test_naive_intNwo.py to test_mixed_precision.py

c00b16d

updated intNwo with _get_linear_subclass_inserter

f765eef

adjust sqnr threshold according to bit width

9a343a4

Hanxian97 added 3 commits July 30, 2024 09:41

fixed test for int4wo and add __init__.py

aafe38e

skip test_aq_int8_weight_only_quant_3_subclass due to seg fault on ni…

1bfa370

…ghtly

edit the sqnr threshold

f4fccf3

Hanxian97 force-pushed the Hanxian_MixedPrecision branch from ec36a94 to f4fccf3 Compare July 30, 2024 16:42

Hanxian97 added 2 commits July 31, 2024 22:05

add unittest

8e787b6

correct import path

e516f0b

andrewor14 merged commit c023f71 into main Aug 1, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mixed-precision quantization milestone1: naive_intNwo + eval/benchmark framework #531

mixed-precision quantization milestone1: naive_intNwo + eval/benchmark framework #531

Hanxian97 commented Jul 19, 2024 •

edited

Loading

pytorch-bot bot commented Jul 19, 2024 •

edited

Loading

andrewor14 left a comment

andrewor14 Jul 22, 2024

Hanxian97 Jul 24, 2024

andrewor14 Jul 22, 2024

andrewor14 Jul 22, 2024

Hanxian97 Jul 24, 2024

andrewor14 Jul 22, 2024

Hanxian97 Jul 24, 2024

andrewor14 Jul 22, 2024

Hanxian97 Jul 24, 2024

andrewor14 Jul 22, 2024

Hanxian97 Jul 24, 2024

jerryzh168 Jul 23, 2024

Hanxian97 Jul 24, 2024

jerryzh168 Jul 23, 2024

Hanxian97 Jul 24, 2024

jerryzh168 Jul 23, 2024

Hanxian97 Jul 24, 2024 •

edited

Loading

jerryzh168 Jul 23, 2024

Hanxian97 Jul 24, 2024

jerryzh168 Jul 23, 2024

Hanxian97 Jul 24, 2024

jerryzh168 Jul 23, 2024

Hanxian97 Jul 24, 2024

andrewor14 left a comment

andrewor14 Jul 24, 2024

Hanxian97 Jul 24, 2024

andrewor14 Jul 24, 2024

Hanxian97 Jul 24, 2024

andrewor14 Jul 25, 2024 •

edited

Loading

andrewor14 Jul 25, 2024

Hanxian97 Jul 25, 2024

		bit_zeropoint = 2 # Example value, please adjust as needed
		bit_scale = 2 # Example value, please adjust as needed


		wait
		done

mixed-precision quantization milestone1: naive_intNwo + eval/benchmark framework #531

mixed-precision quantization milestone1: naive_intNwo + eval/benchmark framework #531

Conversation

Hanxian97 commented Jul 19, 2024 • edited Loading

pytorch-bot bot commented Jul 19, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/531

✅ No Failures

andrewor14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hanxian97 Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hanxian97 commented Jul 19, 2024 •

edited

Loading

pytorch-bot bot commented Jul 19, 2024 •

edited

Loading

Hanxian97 Jul 24, 2024 •

edited

Loading

andrewor14 Jul 25, 2024 •

edited

Loading