[ Examples ] E2E Examples #5

robertgshaw2-neuralmagic · 2024-06-25T00:26:02Z

SUMMARY:

Added examples where user controls dataset preprocessing
Added examples with leading models
W8A8 Channelwise Weights, Dynamic Per Token Example (Llama-3-8B-Instruct) - GPTQ and SmoothQuant
W4A16 G=128 Weights Example (Llama-3-8B-Instruct) - GPTQ
READMES for the full user flow

FOLLOW UP PRs:

Updated W4A16 example to use act_order=True (once supported)
Migrate W8A8 to code based modifiers

examples/quantization/example-w4a16.py

Satrat

There are 143 files changed here, could you break out the removal of the copyright into a separate PR?

examples/quantization/llama7b_fp8_quantization.py

Satrat · 2024-07-02T13:48:26Z

examples/quantization/w8a8_fp8/README.md

+oneshot(
+    model=model, dataset=ds,
+    recipe=recipe,
+    max_seq_length=MAX_SEQUENCE_LENGTH,
+    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
+)


Note that this will same an uncompressed copy of the model, if that isn't desired we should set ouput_dir=SAVE_DIR here then lines 103-105 will overwrite the uncompressed model

Is there a way to not save via output_dir

You can set output_dir to None explicitly, by default it saves to "./output"

robertgshaw2-neuralmagic · 2024-07-02T14:01:25Z

There are 143 files changed here, could you break out the removal of the copyright into a separate PR?

Im not seeing the 143 files?

Satrat · 2024-07-02T14:04:14Z

There are 143 files changed here, could you break out the removal of the copyright into a separate PR?

Im not seeing the 143 files?

Nevermind, when I reviewed it was showing removal of all the copyright headers as changes, just checked again and its fixed

robertgshaw2-neuralmagic · 2024-07-02T16:54:14Z

@bfineran @Satrat

Examples ready to go. Once we cleanup the defaults for W8A8 INT8, I can update the recipe accordingly

Satrat · 2024-07-02T17:10:35Z

examples/quantization_w4a16/README.md

+We first select the quantization algorithm.
+
+In our case, we will apply the default GPTQ recipe for `int4` (which uses static group size 128 scales) to all linear layers.
+> See the `Recipes` documentation for more information on making complex recipes


Can we add a link here or does this documentation not exist yet?

Not exist yet

Satrat · 2024-07-02T17:10:59Z

examples/quantization_w8a8_fp8/README.md

+With the dataset ready, we will now apply quantization.
+
+We first select the quantization algorithm. In our case, we will apply the default recipe for `fp8` (which uses static-per-tensor weights and static-per-tensor activations) to all linear layers.
+> See the `Recipes` documentation for more information on making complex recipes


Not exist yet

Satrat · 2024-07-02T17:11:25Z

examples/quantization_w8a8_int8/README.md

+* Quantize the weights to 8 bits with channelwise scales using GPTQ
+* Quantize the activations with dynamic per token strategy
+
+> See the `Recipes` documentation for more information on recipes


Does not exist yet

Satrat

LGTM, will we need to re-run the evals after the preset schemes land?

robertgshaw2-neuralmagic · 2024-07-02T17:14:31Z

LGTM, will we need to re-run the evals after the preset schemes land?

The evals take 10s so easy to do. The whole flow end to end is 10min on an H100

robertgshaw2-neuralmagic · 2024-07-02T17:14:38Z

@Satrat how do I run the linting?

mgoin

LGTM! Just not a fan of the current modifier scheme syntax i.e. recipe = QuantizationModifier(targets="Linear", scheme="FP8", ignore=["lm_head"]), I would like to import and use python objects directly for the schemes for easy lookup in source.

examples/quantization_w4a16/README.md

Co-authored-by: Michael Goin <michael@neuralmagic.com>

* draft * add memoryless * run bin.quant * before tests, correctness verified * specify sparszoo version * remove sparsezoo Co-authored-by: Benjamin Fineran <benjaminfineran@gmail.com>

robertgshaw2-neuralmagic added 4 commits June 24, 2024 21:33

added examples

a8c3ad8

updated examples

539d31a

set to 32 samples for testing

6f298a7

fix

cfc1ec0

robertgshaw2-neuralmagic requested review from Satrat, bfineran and markurtz June 25, 2024 00:26

robertgshaw2-neuralmagic and others added 5 commits June 24, 2024 20:26

Update llama7b_quantize_sparse_cnn.py

82e8910

Merge branch 'main' into rs/examples

62f8011

tweak W8A8

af0be23

firx w4a16

931c504

added example

e12b65e

robertgshaw2-neuralmagic changed the title ~~[ Examples ] W8A8 and W4A16 Examples~~ [ Examples ] E2E Examples Jun 27, 2024

Robert Shaw added 5 commits June 27, 2024 13:45

tweak fp8 example

982e3ee

remove changes

5971dce

fix

438b01e

update examples to use tokenized data

8822f3c

save

a6bcb90

Satrat approved these changes Jun 27, 2024

View reviewed changes

examples/quantization/example-w4a16.py Outdated Show resolved Hide resolved

robertgshaw2-neuralmagic added 8 commits July 2, 2024 11:54

Merge branch 'main' into rs/examples

466cdb6

fp8 example end to end

f430e43

tweak README

b0eaf12

rename title

a020ebe

update title

7c58ff4

finished example

556eca2

refactored directory structure

39f2ef0

nits

284a0f0

Satrat reviewed Jul 2, 2024

View reviewed changes

robertgshaw2-neuralmagic added 5 commits July 2, 2024 14:09

restructure w4a16

2da06f9

fixed w4a16

367fb0f

added w8a8-int8 example

956e1a4

finalized example

5911c45

added back example

3d4d03b

stash

d600009

Satrat reviewed Jul 2, 2024

View reviewed changes

Satrat approved these changes Jul 2, 2024

View reviewed changes

mgoin approved these changes Jul 2, 2024

View reviewed changes

examples/quantization_w4a16/README.md Outdated Show resolved Hide resolved

robertgshaw2-neuralmagic and others added 2 commits July 2, 2024 18:13

format

708f288

Update examples/quantization_w4a16/README.md

59ea79e

Co-authored-by: Michael Goin <michael@neuralmagic.com>

robertgshaw2-neuralmagic merged commit d746398 into main Jul 2, 2024
5 of 8 checks passed

robertgshaw2-neuralmagic deleted the rs/examples branch July 2, 2024 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ Examples ] E2E Examples #5

[ Examples ] E2E Examples #5

robertgshaw2-neuralmagic commented Jun 25, 2024 •

edited

Loading

Satrat left a comment

Satrat Jul 2, 2024

robertgshaw2-neuralmagic Jul 2, 2024

Satrat Jul 2, 2024

robertgshaw2-neuralmagic commented Jul 2, 2024

Satrat commented Jul 2, 2024

robertgshaw2-neuralmagic commented Jul 2, 2024

Satrat Jul 2, 2024

robertgshaw2-neuralmagic Jul 2, 2024

Satrat Jul 2, 2024

robertgshaw2-neuralmagic Jul 2, 2024

Satrat Jul 2, 2024

robertgshaw2-neuralmagic Jul 2, 2024

Satrat left a comment

robertgshaw2-neuralmagic commented Jul 2, 2024

robertgshaw2-neuralmagic commented Jul 2, 2024

mgoin left a comment

[ Examples ] E2E Examples #5

[ Examples ] E2E Examples #5

Conversation

robertgshaw2-neuralmagic commented Jun 25, 2024 • edited Loading

Satrat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Jul 2, 2024

Satrat commented Jul 2, 2024

robertgshaw2-neuralmagic commented Jul 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Satrat left a comment

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Jul 2, 2024

robertgshaw2-neuralmagic commented Jul 2, 2024

mgoin left a comment

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Jun 25, 2024 •

edited

Loading