Float8 dynamic autoquant #946

jainapurva · 2024-09-25T21:15:01Z

Added support to autoquant for float8 dynamically quantized linear weight, i.e float8 weight and activation.
Added fallback path in safe_int_mm to support float8 with torch.compile

Benchmark:
On llama3.1b - bfloat16 model

For Float8 Dynamic Quant

Time for inference: 14.55 sec total
Average tokens/sec: 13.67
Average Bandwidth: 103.05 GB/s
Peak Memory Usage: 13.77 GB
Model Size: 7.54 GB

For Float8 weight only

Time for inference: 12.02 sec total
Average tokens/sec: 16.65
Average Bandwidth: 125.47 GB/s
Peak Memory Usage: 11.97 GB
Model Size: 7.54 GB

pytorch-bot · 2024-09-25T21:15:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/946

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit ae18023 with merge base fbe97a0 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Run Float8 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/quantization/autoquant.py

torchao/kernel/intmm.py

HDCharles · 2024-09-28T03:46:24Z

torchao/quantization/autoquant.py

+        return weight
+
+    @classmethod
+    def _autoquant_test(cls, act_mat, weight, bias, best_time, mode=["relu", None]):


did you test this in practice? we may need different constants for int8 and float8 dynamic, how does it perform in benchmarks and stuff. If you haven't really tested this on 2-3 models it may be better to just remove it and use the default method which will be very conservative under the interpolation mode and will still work reasonably under the relu mode.

I've tested it on Llama, the numbers aren't great, but I can push it to next PR, with more benchmarks

HDCharles · 2024-09-28T03:49:45Z

torchao/quantization/utils.py

@@ -139,13 +139,12 @@ def _get_per_token_block_size(x: torch.Tensor) -> List[int]:
 # taken from
 # https://github.com/mit-han-lab/smoothquant/blob/2f87951dacfb9238d8d657f52ae83a82a3c9ba0c/smoothquant/fake_quant.py#L26
 # and slightly modified
-def quantize_activation_per_token_absmax(t):
+def quantize_activation_per_token_absmax(t, dtype=torch.int8):


does this actually work in practice with non int8 dtypes?, we're still using the same quant min/max as +-128, this seems inadvisable.

I also don't think we should extend this function, should probably just call into whatever quant function is normally used for that dtype, this is a specific instance of function where the mapping types and quant min/max are hard coded to specific values so it shouldn't be extended.

We don't need this anymore, I'm reverting the changes to this method, as there's another implementation in float8 that I'll be using.

HDCharles

see comments, i would probably skip the quant test bit unless it gets tested on 2-3 models and the way the activation quantization is implemented seems like its going to cause issues because its only superficially been altered away from its normal int8 quantization.

torchao/quantization/autoquant.py

drisspg · 2024-10-01T23:14:51Z

torchao/quantization/autoquant.py

@@ -492,6 +494,46 @@ def from_float(cls, weight):
        block_size = (1, weight.shape[1])
        return super(AQFloat8WeightOnlyQuantizedLinearWeight, cls).from_hp_to_floatx(weight, block_size, target_dtype=cls.target_dtype, layout_type=Float8LayoutType())

+class AQFloat8DynamicallyQuantizedLinearWeight(AQMixin, LinearActivationQuantizedTensor):


I think this looks good, lets name this PerRow scaling and lets only import PerRow above

Mention Cmake as a requirement to build the LLaMA app with Xcode

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2024

jainapurva requested a review from HDCharles September 25, 2024 21:15

float8 dynamic autoquant

58fe60d

jainapurva force-pushed the float8_dyn_act_autoquant branch from ffca55c to 58fe60d Compare September 26, 2024 02:54

jainapurva requested a review from jerryzh168 September 26, 2024 04:49

jerryzh168 reviewed Sep 26, 2024

View reviewed changes

torchao/quantization/autoquant.py Outdated Show resolved Hide resolved

float8 dynamic autoquant

f646519

jainapurva requested a review from drisspg September 26, 2024 20:38

jainapurva marked this pull request as ready for review September 26, 2024 22:56

jainapurva requested a review from jerryzh168 September 27, 2024 16:04

drisspg reviewed Sep 27, 2024

View reviewed changes

torchao/quantization/autoquant.py Outdated Show resolved Hide resolved

jainapurva requested a review from drisspg September 27, 2024 22:01

drisspg reviewed Sep 27, 2024

View reviewed changes

torchao/kernel/intmm.py Outdated Show resolved Hide resolved

HDCharles reviewed Sep 28, 2024

View reviewed changes

HDCharles requested changes Sep 28, 2024

View reviewed changes

float8 dynamic autoquant

bfe1eee

jainapurva force-pushed the float8_dyn_act_autoquant branch from ebcfb9e to bfe1eee Compare September 30, 2024 16:55

float8 dynamic autoquant

c2aedfd

drisspg reviewed Sep 30, 2024

View reviewed changes

torchao/quantization/autoquant.py Outdated Show resolved Hide resolved

drisspg reviewed Sep 30, 2024

View reviewed changes

torchao/quantization/autoquant.py Show resolved Hide resolved

jainapurva added 8 commits September 30, 2024 12:46

float8 dynamic autoquant

f439256

float8 dynamic autoquant

432bef4

float8 dynamic autoquant

0897f14

float8 dynamic autoquant

81bbeb1

float8 dynamic autoquant

257d8cf

float8 dynamic autoquant

c762e77

float8 dynamic autoquant

6c6e46b

float8 dynamic autoquant

35903de

drisspg reviewed Oct 1, 2024

View reviewed changes

drisspg approved these changes Oct 1, 2024

View reviewed changes

jainapurva added 2 commits October 1, 2024 17:48

float8 dynamic autoquant

17a1c56

float8 dynamic autoquant

ae18023

jainapurva merged commit 9229df9 into main Oct 2, 2024
13 checks passed

melvinebenezer pushed a commit to melvinebenezer/ao that referenced this pull request Oct 7, 2024

Float8 dynamic autoquant (pytorch#946)

76de4cb

jainapurva mentioned this pull request Oct 28, 2024

autoquant support for fp8 #715

Closed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Update README.md (pytorch#946)

7b4fa7c

Mention Cmake as a requirement to build the LLaMA app with Xcode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Float8 dynamic autoquant #946

Float8 dynamic autoquant #946

Uh oh!

jainapurva commented Sep 25, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 25, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HDCharles Sep 28, 2024

Uh oh!

jainapurva Sep 30, 2024

Uh oh!

HDCharles Sep 28, 2024 •

edited

Loading

Uh oh!

jainapurva Sep 30, 2024

Uh oh!

HDCharles left a comment

Uh oh!

Uh oh!

Uh oh!

drisspg Oct 1, 2024

Uh oh!

Uh oh!

Uh oh!

Float8 dynamic autoquant #946

Float8 dynamic autoquant #946

Uh oh!

Conversation

jainapurva commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/946

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HDCharles Sep 28, 2024

Choose a reason for hiding this comment

Uh oh!

jainapurva Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

HDCharles Sep 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jainapurva Sep 30, 2024

Choose a reason for hiding this comment

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

drisspg Oct 1, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jainapurva commented Sep 25, 2024 •

edited

Loading

pytorch-bot bot commented Sep 25, 2024 •

edited

Loading

HDCharles Sep 28, 2024 •

edited

Loading