[DOC] How to use dynamic in QuantizeConfig #1177

sidhantls · 2025-01-28T11:29:56Z

I'm trying to figure out how to use different quantization bits for different layers. I found that this can be done through QuantizeConfig, dynamic. However, in what format am I supposed to pass it todynamic?

Is Dynamic a dictionary mapping of layer name to number of bits? I know how many bits I want to use for each linear layer. I'm unable to figure out the format from:

dynamic: Optional[Dict[str, Dict[str, Union[int, bool]]]] = field(default=None)

The text was updated successfully, but these errors were encountered:

Qubitium · 2025-01-29T00:59:04Z

@sidhantls Please see our CI test for a simple example:

GPTQModel/tests/test_dynamic.py

Lines 68 to 76 in 2580148

    
           # support dynamic override of bits, group_size, desc_act, sym for each layer/module match 
        
           dynamic = { 
        
               # `.*\.` matches the layers_node prefix 
        
               # layer index start at 0 
        
               r".*\.18\..*gate.*": {"bits": 8, "group_size": 64},  # match layer 18 gate module 
        
               r".*\.19\..*gate.*": {"bits": 8, "group_size": 64},  # match layer 19 gate module 
        
               r".*\.20\..*gate.*": {"bits": 8, "group_size": 64},  # match layer 20 gate module 
        
               r".*\.21\..*gate.*": {"bits": 8, "group_size": 64},  # match layer 21 gate module 
        
           }

In the above example, bits and group size properties are unique to the layer 19-22 match and override base QuantizeConfig

Usage notes:

Dynamic rules allows ["+:", "-:"] (positive/negative matching). If you do not use a prefix +:, it is default to positive matching. Negative matching helps with skipping some modules.
Rules are python re regex based.
When dynamic rule is matched, the dynamic key/value overrides the default QuantizeConfig. You can currently override the following properties: [bits, group_size, sym]. Please use release 1.7.4 tagged branch. main may not be stable as I am fixing some bugs and doing refractor.

ref matching code:

GPTQModel/gptqmodel/utils/model.py

Lines 235 to 246 in 2580148

    
           # dynamic bits, group_size, sym for each layer/module 
        
           if dynamic is not None: 
        
               if dynamic_get(dynamic=dynamic, layer_name=name) == False:  # noqa: E712 
        
                   # skip create this quant linear 
        
                   continue 
        
               for pattern, pattern_dict in dynamic.items(): 
        
                   if re.match(pattern, name): 
        
                       d_bits = pattern_dict.get("bits", bits) 
        
                       d_group_size = pattern_dict.get("group_size", group_size) 
        
                       d_sym = pattern_dict.get("sym", sym) 
        
                       break

Qubitium · 2025-01-29T14:28:54Z

main should be stable now as I fixed some regression code with the new pack_dtype option and passing our CI tests.

@sidhantls Please note that on vllm, dynamic requires our pending vllm PR to run if you need full-fast performance. Otherwise, dynamic is only supported via GPTQModel inference apis. Use marlin kernel for fastest batching performance.

I have also added note about dynamic usage in the README.md: https://github.com/ModelCloud/GPTQModel?tab=readme-ov-file#dynamic-quantization-per-module-quantizeconfig-override

Qubitium · 2025-01-30T13:54:37Z

@sidhantls Do you have enough info to test out dynamic?

sidhantls · 2025-02-03T17:08:25Z

@Qubitium Hey, thanks for following up on this, Yes, I did try it out and it works

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig

model_id = "EleutherAI/pythia-160m"

calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(1024))["text"]

calibration_dataset = [" ".join(item.split()[:30]) for item in calibration_dataset]

dynamic = {
    # `.*\.` matches the layers_node prefix
    # layer index start at 0
    r".*\.dense_4h_to_h.*": {"bits": 8, "group_size": 128},  # match layer 1 gate module
    r".*\.dense_h_to_4h.*": {"bits": 4, "group_size": 128},  # match layer 2 gate module
}
quantize_config = QuantizeConfig(
    bits=4,
    dynamic=dynamic,
    group_size=128,
)

model = GPTQModel.load(model_id, quantize_config)

# increase `batch_size` to match gpu/vram specs to speed up quantization
model.quantize(calibration_dataset, batch_size=1)
model.save("saved_model")

Qubitium changed the title ~~how to use dynamic in GPTQModelConfig~~ [DOC] How to use dynamic in QuantizeConfig Jan 29, 2025

Qubitium closed this as completed Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] How to use dynamic in QuantizeConfig #1177

[DOC] How to use dynamic in QuantizeConfig #1177

sidhantls commented Jan 28, 2025

Qubitium commented Jan 29, 2025 •

edited

Loading

Qubitium commented Jan 29, 2025 •

edited

Loading

Qubitium commented Jan 30, 2025

sidhantls commented Feb 3, 2025 •

edited

Loading

[DOC] How to use dynamic in QuantizeConfig #1177

[DOC] How to use dynamic in QuantizeConfig #1177

Comments

sidhantls commented Jan 28, 2025

Qubitium commented Jan 29, 2025 • edited Loading

Qubitium commented Jan 29, 2025 • edited Loading

Qubitium commented Jan 30, 2025

sidhantls commented Feb 3, 2025 • edited Loading

Qubitium commented Jan 29, 2025 •

edited

Loading

Qubitium commented Jan 29, 2025 •

edited

Loading

sidhantls commented Feb 3, 2025 •

edited

Loading