Skip to content

Commit cdf686f

Browse files
rahul-tulidsikka
andauthored
Fix: Re-enable Sparse Compression for 2of4 Examples (#1153)
This PR restores sparse compression for our `2of4` examples, which was previously disabled due to a bug in the vLLM Cutlass integration. #### Background A bug in the Cutlass integration caused certain sparse-only compressed models to produce gibberish results. To mitigate this issue, we temporarily turned off sparse compression for our `2of4` examples. The bug has since been fixed by @tlrmchlsmth in [vllm-project/vllm#13198](vllm-project/vllm#13198). With this fix in place, we can safely re-enable sparse compression for these examples. #### Changes - Re-enable sparse compression for `2of4` examples. #### Testing - Verified that sparse-only compressed models now produce expected outputs. --------- Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
1 parent c8091d3 commit cdf686f

File tree

2 files changed

+2
-4
lines changed

2 files changed

+2
-4
lines changed

examples/sparse_2of4_quantization_fp8/llama3_8b_2of4.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,5 @@ def get_recipe(fp8_enabled):
116116
print("==========================================\n")
117117

118118
# Save compressed model and tokenizer
119-
model.save_pretrained(
120-
save_dir, save_compressed=args.fp8, disable_sparse_compression=True
121-
)
119+
model.save_pretrained(save_dir, save_compressed=args.fp8)
122120
tokenizer.save_pretrained(save_dir)

tests/e2e/vLLM/configs/sparse_24.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ recipe: tests/e2e/vLLM/recipes/Sparse_2of4/recipe_sparse_2of4.yaml
55
scheme: sparse2of4_only
66
dataset_id: HuggingFaceH4/ultrachat_200k
77
dataset_split: train_sft
8-
save_compressed: False
8+
save_compressed: True

0 commit comments

Comments
 (0)