-
Notifications
You must be signed in to change notification settings - Fork 211
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add marlin and semi sparse + quant option to autoquant (#1399)
* Add marlin and semi sparse + quant option to autoquant Summary: Added DEFAULT_SPARSE_AUTOQUANT_CLASS_LIST for autoquant (v1) that contains: AQDefaultLinearWeight, AQInt4G128WeightOnlyQuantizedMarlinSparseLinearWeight (float16 only) and AQInt8DynamicallyQuantizedSemiSparseLinearWeight Test Plan: tested on llama and sam python eval_combo.py --coco_root_dir datasets/coco2017 --coco_slice_name val2017 --sam_checkpoint_base_path checkpoints --sam_model_type vit_h --point_sampling_cache_dir tmp/sam_coco_mask_center_cache --mask_debug_out_dir tmp/sam_eval_masks_out --batch_size 32 --num_workers 32 --use_compile max-autotune --use_half bfloat16 --device cuda --compress autoquant-sparse +cuda,vit_h,32,10271,12,25.575582921440905,39.099793074967025,0.5424332682384179,max-autotune,torch.bfloat16,autoquant-sparse,False,True,True,32,154,4928,None,None Baseline: around 22/23 python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --compile_prefill --quantization autoquant-sparse --precision float16 Average tokens/sec: 160.55 Base: Average tokens/sec: 110.47 Reviewers: Subscribers: Tasks: Tags: * ruff
- Loading branch information
1 parent
63b30ca
commit 039cef4
Showing
5 changed files
with
59 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters