Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] How to use dynamic in QuantizeConfig #1177

Closed
sidhantls opened this issue Jan 28, 2025 · 4 comments
Closed

[DOC] How to use dynamic in QuantizeConfig #1177

sidhantls opened this issue Jan 28, 2025 · 4 comments

Comments

@sidhantls
Copy link

I'm trying to figure out how to use different quantization bits for different layers. I found that this can be done through QuantizeConfig, dynamic. However, in what format am I supposed to pass it todynamic?

Is Dynamic a dictionary mapping of layer name to number of bits? I know how many bits I want to use for each linear layer. I'm unable to figure out the format from:

dynamic: Optional[Dict[str, Dict[str, Union[int, bool]]]] = field(default=None)

@Qubitium
Copy link
Collaborator

Qubitium commented Jan 29, 2025

@sidhantls Please see our CI test for a simple example:

# support dynamic override of bits, group_size, desc_act, sym for each layer/module match
dynamic = {
# `.*\.` matches the layers_node prefix
# layer index start at 0
r".*\.18\..*gate.*": {"bits": 8, "group_size": 64}, # match layer 18 gate module
r".*\.19\..*gate.*": {"bits": 8, "group_size": 64}, # match layer 19 gate module
r".*\.20\..*gate.*": {"bits": 8, "group_size": 64}, # match layer 20 gate module
r".*\.21\..*gate.*": {"bits": 8, "group_size": 64}, # match layer 21 gate module
}

In the above example, bits and group size properties are unique to the layer 19-22 match and override base QuantizeConfig

Usage notes:

  1. Dynamic rules allows ["+:", "-:"] (positive/negative matching). If you do not use a prefix +:, it is default to positive matching. Negative matching helps with skipping some modules.
  2. Rules are python re regex based.
  3. When dynamic rule is matched, the dynamic key/value overrides the default QuantizeConfig. You can currently override the following properties: [bits, group_size, sym]. Please use release 1.7.4 tagged branch. main may not be stable as I am fixing some bugs and doing refractor.

ref matching code:

# dynamic bits, group_size, sym for each layer/module
if dynamic is not None:
if dynamic_get(dynamic=dynamic, layer_name=name) == False: # noqa: E712
# skip create this quant linear
continue
for pattern, pattern_dict in dynamic.items():
if re.match(pattern, name):
d_bits = pattern_dict.get("bits", bits)
d_group_size = pattern_dict.get("group_size", group_size)
d_sym = pattern_dict.get("sym", sym)
break

@Qubitium
Copy link
Collaborator

Qubitium commented Jan 29, 2025

main should be stable now as I fixed some regression code with the new pack_dtype option and passing our CI tests.

@sidhantls Please note that on vllm, dynamic requires our pending vllm PR to run if you need full-fast performance. Otherwise, dynamic is only supported via GPTQModel inference apis. Use marlin kernel for fastest batching performance.

I have also added note about dynamic usage in the README.md: https://github.com/ModelCloud/GPTQModel?tab=readme-ov-file#dynamic-quantization-per-module-quantizeconfig-override

@Qubitium Qubitium changed the title how to use dynamic in GPTQModelConfig [DOC] How to use dynamic in QuantizeConfig Jan 29, 2025
@Qubitium
Copy link
Collaborator

@sidhantls Do you have enough info to test out dynamic?

@sidhantls
Copy link
Author

sidhantls commented Feb 3, 2025

@Qubitium Hey, thanks for following up on this, Yes, I did try it out and it works

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig

model_id = "EleutherAI/pythia-160m"

calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(1024))["text"]

calibration_dataset = [" ".join(item.split()[:30]) for item in calibration_dataset]

dynamic = {
    # `.*\.` matches the layers_node prefix
    # layer index start at 0
    r".*\.dense_4h_to_h.*": {"bits": 8, "group_size": 128},  # match layer 1 gate module
    r".*\.dense_h_to_4h.*": {"bits": 4, "group_size": 128},  # match layer 2 gate module
}
quantize_config = QuantizeConfig(
    bits=4,
    dynamic=dynamic,
    group_size=128,
)

model = GPTQModel.load(model_id, quantize_config)

# increase `batch_size` to match gpu/vram specs to speed up quantization
model.quantize(calibration_dataset, batch_size=1)
model.save("saved_model")

@Qubitium Qubitium closed this as completed Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants