Release GPTQModel v0.9.10 · ModelCloud/GPTQModel

What's Changed

Ported vllm/nm gptq_marlin inference kernel with expanded bits (8bits), group_size (64,32), and desc_act support for all GPTQ models with format = FORMAT.GPTQ. Auto calculate auto-round nsamples/seglen parameters based on calibration dataset. Fixed save_quantized() called on pre-quantized models with non-supported backends. HF transformers depend updated to ensure Llama 3.1 fixes are correctly applied to both quant and inference stage.

[CORE] add marlin inference kernel by @ZX-ModelCloud in #310
[CI] Increase timeout to 40m by @CSY-ModelCloud in #295, #299
[FIX] save_quantized() by @ZX-ModelCloud in #296
[FIX] autoround nsample/seqlen to be actual size of calibration_dataset. by @LRL-ModelCloud in #297, @LRL-ModelCloud in #298
Update HF transformers to 4.43.3 by @Qubitium in #305
[CI] remove test_marlin_hf_cache_serialization() by @ZX-ModelCloud in #314

Full Changelog: v0.9.9...v0.9.10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v0.9.10

What's Changed

Contributors