[UX] Allow quantization of weights without calibration data #28

mgoin · 2024-07-19T14:07:53Z

In the case of using round-to-nearest or static scaling for quantization formats like FP8 weights i.e. using QuantizationModifier on weights, there is no need for calibration data. Ideally, there should be no forward pass required at all.

Proposed UX:

from transformers import AutoModelForCausalLM
from compressed_tensors.quantization import QuantizationArgs, QuantizationType, QuantizationScheme, QuantizationStrategy
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import oneshot

model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m")

FP8_W8 = QuantizationScheme(
    targets=["Linear"],
    weights=QuantizationArgs(
        num_bits=8,
        type=QuantizationType.FLOAT,
        strategy=QuantizationStrategy.TENSOR,
        symmetric=True,
        dynamic=False,
    ),
)

recipe = QuantizationModifier(
    config_groups={"group_0": FP8_W8},
    ignore=["lm_head"],
)

oneshot(
    model=model,
    recipe=recipe,
)

The text was updated successfully, but these errors were encountered:

Satrat · 2024-07-19T14:28:41Z

Hey @mgoin, we do have a PR up for this in compressed-tensors: neuralmagic/compressed-tensors#108 where the weight/scale just get initialized when quantization is applied. Still some more work to do here as it doesn't account for the per-token use case that also doesn't require calibration.

mgoin · 2024-07-19T14:33:59Z

Very nice, thanks @Satrat. Could you share an example of what the UX would be like? I'm not sure how quantization is applied outside of oneshot, or if it will run without the dataset argument.

Satrat · 2024-07-24T15:24:35Z

Update here, two PRs are up that enable this feature:

An example of the UX is here: https://github.com/vllm-project/llm-compressor/blob/sa/big_model_support/examples/big_model_offloading/big_model_fp8.py

Satrat · 2024-08-06T16:38:59Z

Closing this out as the PRs mentioned above got merged, this feature is now in main

mgoin added the enhancement New feature or request label Jul 19, 2024

Satrat closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UX] Allow quantization of weights without calibration data #28

[UX] Allow quantization of weights without calibration data #28

mgoin commented Jul 19, 2024 •

edited

Loading

Satrat commented Jul 19, 2024

mgoin commented Jul 19, 2024

Satrat commented Jul 24, 2024

Satrat commented Aug 6, 2024

[UX] Allow quantization of weights without calibration data #28

[UX] Allow quantization of weights without calibration data #28

Comments

mgoin commented Jul 19, 2024 • edited Loading

Satrat commented Jul 19, 2024

mgoin commented Jul 19, 2024

Satrat commented Jul 24, 2024

Satrat commented Aug 6, 2024

mgoin commented Jul 19, 2024 •

edited

Loading