Update Quantization Save Defaults #22

Satrat · 2024-07-12T14:09:20Z

SUMMARY:
We now default to save_compressed=True when running oneshot() or save_pretrained() so quantized models are compressed automatically. Also updated the default save formats as follows

w4a16 - packed-quantized
w8a16 - packed-quantized
w8a8 - int8 - int-quantized
w8a8 - fp8 - float-quantized
w4a16 2:4 - marlin-24
w8a16 2:4 - marlin-24

TEST PLAN:
Added unit test for default save formats

Sara Adkins added 3 commits July 12, 2024 14:04

update quantization defaults

9956e8e

Merge branch 'main' into sa/quant_defaults

63d46d2

edge cases

2876718

Satrat requested review from bfineran, robertgshaw2-neuralmagic and markurtz July 12, 2024 14:25

fix tests

3dced68

bfineran approved these changes Jul 15, 2024

View reviewed changes

bfineran merged commit 50642a6 into main Jul 15, 2024
8 of 12 checks passed

bfineran deleted the sa/quant_defaults branch July 15, 2024 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Quantization Save Defaults #22

Update Quantization Save Defaults #22

Satrat commented Jul 12, 2024 •

edited

Loading

Update Quantization Save Defaults #22

Update Quantization Save Defaults #22

Conversation

Satrat commented Jul 12, 2024 • edited Loading

Satrat commented Jul 12, 2024 •

edited

Loading