Don't have a sparsity_config if model is quantized #680

rahul-tuli · 2024-09-26T16:40:11Z

Previously, a no-operation (no-op) dense compression
configuration was being used as the sparsity_config for sparse-quantized models

However, since we do not yet support sparse compression for sparse-quantized models, the sparsity_config should ideally be left empty. This PR addresses that by ensuring the sparsity_config remains empty until sparse compression support is implemented for sparse-quantized models

Note: This only becomes relevant for cases when global_sparsity induced by quantization is > 0.05

Before this PR the compression_config from config.json was:

  "compression_config": {
    "config_groups": {
      "group_0": {
        "input_activations": null,
        "output_activations": null,
        "targets": [
          "Linear"
        ],
        "weights": {
          "actorder": null,
          "block_structure": null,
          "dynamic": false,
          "group_size": 128,
          "num_bits": 4,
          "observer": "minmax",
          "observer_kwargs": {},
          "strategy": "group",
          "symmetric": true,
          "type": "int"
        }
      }
    },
    "format": "pack-quantized",
    "global_compression_ratio": 1.883165566487463,
    "ignore": [
      "lm_head"
    ],
    "kv_cache_scheme": null,
    "quant_method": "compressed-tensors",
    "quantization_status": "compressed",
    "sparsity_config": {
      "format": "dense",
      "global_sparsity": 0.14297135853065746,
      "ignore": [
        "model.layers.0.self_attn.o_proj",
        "model.layers.0.mlp.gate_proj",
        "model.layers.0.mlp.up_proj",
        "model.layers.0.mlp.down_proj",
        "model.layers.1.self_attn.v_proj",
        "model.layers.1.self_attn.o_proj",
        "model.layers.1.mlp.gate_proj",
        "model.layers.1.mlp.up_proj",
        "model.layers.1.mlp.down_proj",
        "model.layers.2.self_attn.q_proj",
        "model.layers.2.self_attn.k_proj",
        "model.layers.2.self_attn.v_proj",
        "model.layers.2.self_attn.o_proj",
        "model.layers.2.mlp.gate_proj",
        "model.layers.2.mlp.up_proj",
        "model.layers.2.mlp.down_proj",
        "model.layers.3.self_attn.q_proj",
        "model.layers.3.self_attn.k_proj",
        "model.layers.3.self_attn.v_proj",
        "model.layers.3.self_attn.o_proj",
        "model.layers.3.mlp.gate_proj",
        "model.layers.3.mlp.up_proj",
        "model.layers.3.mlp.down_proj",
        "model.layers.4.self_attn.q_proj",
        "model.layers.4.self_attn.k_proj",
        "model.layers.4.self_attn.v_proj",
        "model.layers.4.self_attn.o_proj",
        "model.layers.4.mlp.gate_proj",
        "model.layers.4.mlp.up_proj",
        "model.layers.4.mlp.down_proj",
        "model.layers.5.self_attn.q_proj",
        "model.layers.5.self_attn.k_proj",
        "model.layers.5.self_attn.v_proj",
        "model.layers.5.self_attn.o_proj",
        "model.layers.5.mlp.gate_proj",
        "model.layers.5.mlp.up_proj",
        "model.layers.5.mlp.down_proj",
        "model.layers.6.self_attn.q_proj",
        "model.layers.6.self_attn.k_proj",
        "model.layers.6.self_attn.v_proj",
        "model.layers.6.self_attn.o_proj",
        "model.layers.6.mlp.gate_proj",
        "model.layers.6.mlp.up_proj",
        "model.layers.6.mlp.down_proj",
        "model.layers.7.self_attn.q_proj",
        "model.layers.7.self_attn.k_proj",
        "model.layers.7.self_attn.v_proj",
        "model.layers.7.self_attn.o_proj",
        "model.layers.7.mlp.gate_proj",
        "model.layers.7.mlp.up_proj",
        "model.layers.7.mlp.down_proj",
        "model.layers.8.self_attn.q_proj",
        "model.layers.8.self_attn.k_proj",
        "model.layers.8.self_attn.v_proj",
        "model.layers.8.self_attn.o_proj",
        "model.layers.8.mlp.gate_proj",
        "model.layers.8.mlp.up_proj",
        "model.layers.8.mlp.down_proj",
        "model.layers.9.self_attn.q_proj",
        "model.layers.9.self_attn.k_proj",
        "model.layers.9.self_attn.v_proj",
        "model.layers.9.self_attn.o_proj",
        "model.layers.9.mlp.gate_proj",
        "model.layers.9.mlp.up_proj",
        "model.layers.9.mlp.down_proj",
        "model.layers.10.self_attn.q_proj",
        "model.layers.10.self_attn.k_proj",
        "model.layers.10.self_attn.v_proj",
        "model.layers.10.self_attn.o_proj",
        "model.layers.10.mlp.gate_proj",
        "model.layers.10.mlp.up_proj",
        "model.layers.10.mlp.down_proj",
        "model.layers.11.self_attn.q_proj",
        "model.layers.11.self_attn.k_proj",
        "model.layers.11.self_attn.v_proj",
        "model.layers.11.self_attn.o_proj",
        "model.layers.11.mlp.gate_proj",
        "model.layers.11.mlp.up_proj",
        "model.layers.11.mlp.down_proj",
        "model.layers.12.self_attn.q_proj",
        "model.layers.12.self_attn.k_proj",
        "model.layers.12.self_attn.v_proj",
        "model.layers.12.self_attn.o_proj",
        "model.layers.12.mlp.gate_proj",
        "model.layers.12.mlp.up_proj",
        "model.layers.12.mlp.down_proj",
        "model.layers.13.self_attn.q_proj",
        "model.layers.13.self_attn.k_proj",
        "model.layers.13.self_attn.v_proj",
        "model.layers.13.self_attn.o_proj",
        "model.layers.13.mlp.gate_proj",
        "model.layers.13.mlp.up_proj",
        "model.layers.13.mlp.down_proj",
        "model.layers.14.self_attn.q_proj",
        "model.layers.14.self_attn.k_proj",
        "model.layers.14.self_attn.v_proj",
        "model.layers.14.self_attn.o_proj",
        "model.layers.14.mlp.gate_proj",
        "model.layers.14.mlp.up_proj",
        "model.layers.14.mlp.down_proj",
        "model.layers.15.self_attn.q_proj",
        "model.layers.15.self_attn.k_proj",
        "model.layers.15.self_attn.v_proj",
        "model.layers.15.self_attn.o_proj",
        "model.layers.15.mlp.gate_proj",
        "model.layers.15.mlp.up_proj",
        "model.layers.15.mlp.down_proj",
        "model.layers.16.self_attn.q_proj",
        "model.layers.16.self_attn.k_proj",
        "model.layers.16.self_attn.v_proj",
        "model.layers.16.self_attn.o_proj",
        "model.layers.16.mlp.gate_proj",
        "model.layers.16.mlp.up_proj",
        "model.layers.16.mlp.down_proj",
        "model.layers.17.self_attn.q_proj",
        "model.layers.17.self_attn.k_proj",
        "model.layers.17.self_attn.v_proj",
        "model.layers.17.self_attn.o_proj",
        "model.layers.17.mlp.gate_proj",
        "model.layers.17.mlp.up_proj",
        "model.layers.17.mlp.down_proj",
        "model.layers.18.self_attn.q_proj",
        "model.layers.18.self_attn.k_proj",
        "model.layers.18.self_attn.v_proj",
        "model.layers.18.self_attn.o_proj",
        "model.layers.18.mlp.gate_proj",
        "model.layers.18.mlp.up_proj",
        "model.layers.18.mlp.down_proj",
        "model.layers.19.self_attn.q_proj",
        "model.layers.19.self_attn.k_proj",
        "model.layers.19.self_attn.v_proj",
        "model.layers.19.self_attn.o_proj",
        "model.layers.19.mlp.gate_proj",
        "model.layers.19.mlp.up_proj",
        "model.layers.19.mlp.down_proj",
        "model.layers.20.self_attn.q_proj",
        "model.layers.20.self_attn.k_proj",
        "model.layers.20.self_attn.v_proj",
        "model.layers.20.self_attn.o_proj",
        "model.layers.20.mlp.gate_proj",
        "model.layers.20.mlp.up_proj",
        "model.layers.20.mlp.down_proj",
        "model.layers.21.self_attn.q_proj",
        "model.layers.21.self_attn.k_proj",
        "model.layers.21.self_attn.v_proj",
        "model.layers.21.self_attn.o_proj",
        "model.layers.21.mlp.gate_proj",
        "model.layers.21.mlp.up_proj",
        "model.layers.21.mlp.down_proj",
        "lm_head"
      ],
      "registry_requires_subclass": false,
      "sparsity_structure": "unstructured",
      "targets": [
        "model.layers.0.self_attn.q_proj",
        "model.layers.0.self_attn.k_proj",
        "model.layers.0.self_attn.v_proj",
        "model.layers.1.self_attn.q_proj",
        "model.layers.1.self_attn.k_proj"
      ]
    },
    "version": "0.6.0.20240926"
  }

And now it is:

  "compression_config": {
    "config_groups": {
      "group_0": {
        "input_activations": null,
        "output_activations": null,
        "targets": [
          "Linear"
        ],
        "weights": {
          "actorder": null,
          "block_structure": null,
          "dynamic": false,
          "group_size": 128,
          "num_bits": 4,
          "observer": "minmax",
          "observer_kwargs": {},
          "strategy": "group",
          "symmetric": true,
          "type": "int"
        }
      }
    },
    "format": "pack-quantized",
    "global_compression_ratio": 2.1743527231963227,
    "ignore": [
      "lm_head"
    ],
    "kv_cache_scheme": null,
    "quant_method": "compressed-tensors",
    "quantization_status": "compressed",
    "version": "0.6.0.20240926"
  },

Test script:

from llmcompressor.modifiers.obcq import SparseGPTModifier 
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.modifiers.pruning import ConstantPruningModifier
from llmcompressor.transformers import oneshot
from llmcompressor.transformers import SparseAutoModelForCausalLM


MODEL_ID = "Xenova/llama2.c-stories110M"
DATASET_ID, SPLIT = "open_platypus", {"calibration": "train[:1%]"}
NUM_CALIBRATION_SAMPLES = 16
OUTPUT_DIR = MODEL_ID.split("/")[-1] + f"-{NUM_CALIBRATION_SAMPLES}-quantized"
MAX_SEQUENCE_LENGTH = 512

targets = ["Linear"]
ignore = ["lm_head"]
scheme = "W4A16"

recipe = [
    QuantizationModifier(scheme=scheme, targets=targets, ignore=ignore),
]

model = SparseAutoModelForCausalLM.from_pretrained(
    MODEL_ID,  device_map="auto", torch_dtype="auto"
)


oneshot(
    model=model,
    dataset=DATASET_ID,
    splits=SPLIT,
    recipe=recipe,
    save_compressed=True,
    output_dir=OUTPUT_DIR,
    overwrite_output_dir=True,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)

print("Compression Done, Model Saved at:", OUTPUT_DIR)

github-actions · 2024-09-26T16:40:22Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

kylesayrs

Rather than modifying llm-compressor logic, shouldn't we be instead modifying the ModelCompressor logic to not save the sparsity config if the sparsity compressor is the dense compressor?

Making the change upstream has the advantage of standardization across all CT users

rahul-tuli · 2024-09-26T18:12:02Z

Rather than modifying llm-compressor logic, shouldn't we be instead modifying the ModelCompressor logic to not save the sparsity config if the sparsity compressor is the dense compressor?

Making the change upstream has the advantage of standardization across all CT users

I like that, good call out

rahul-tuli · 2024-09-26T18:34:35Z

Closing this one out, the changes were made on Compressed-Tensors side as recommended by @kylesayrs
Here neuralmagic/compressed-tensors#169

Don't have a sparsity_config if model is quantized

3ff79cd

Changes to ensure marlin24 compressor can be initialized

84136b2

rahul-tuli marked this pull request as ready for review September 26, 2024 16:58

rahul-tuli requested review from mgoin, kylesayrs, dsikka and horheynm September 26, 2024 16:59

kylesayrs reviewed Sep 26, 2024

View reviewed changes

rahul-tuli closed this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't have a sparsity_config if model is quantized #680

Don't have a sparsity_config if model is quantized #680

rahul-tuli commented Sep 26, 2024 •

edited

Loading

github-actions bot commented Sep 26, 2024

kylesayrs left a comment •

edited

Loading

rahul-tuli commented Sep 26, 2024

rahul-tuli commented Sep 26, 2024 •

edited

Loading

Don't have a sparsity_config if model is quantized #680

Don't have a sparsity_config if model is quantized #680

Conversation

rahul-tuli commented Sep 26, 2024 • edited Loading

github-actions bot commented Sep 26, 2024

kylesayrs left a comment • edited Loading

Choose a reason for hiding this comment

rahul-tuli commented Sep 26, 2024

rahul-tuli commented Sep 26, 2024 • edited Loading

rahul-tuli commented Sep 26, 2024 •

edited

Loading

kylesayrs left a comment •

edited

Loading

rahul-tuli commented Sep 26, 2024 •

edited

Loading