Release GPTQModel v1.4.0 · ModelCloud/GPTQModel

What's Changed

⚡ EvalPlus harness integration merged upstream. We now support both lm-eval and EvalPlus.
⚡ Added pure torch Torch kernel.
⚡ Refactored Cuda kernel to be DynamicCuda kernel.
⚡ Triton kernel now auto-padded for max model support.
⚡ Dynamic quantization now supports both positive +::default, and -: negative matching which allows matched modules to be skipped entirely for quantization.
⚡Added auto-kernel fallback for unsupported kernel/module pairs.
🐛 Fixed auto-Marlin kernel selection.
🗑 Deprecated the saving of Marlin weight format. Marlin allows auto conversion of gptq format to Marlin during runtime. gptq format allows max kernel flexibility including Marlin kernel support.

Lots of internal refractor and cleanup in-preparation for transformers/optimum/peft upstream PR merge.

Remove Marlin old kernel and Marlin format saving. Marlin[new] is still supported via inference. by @CSY-ModelCloud in #714
Remove marlin(old) kernel codes & do ruff by @CSY-ModelCloud in #719
[FIX] gptq v2 load by @ZX-ModelCloud in #724
Add hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format,… by @LRL-ModelCloud in #727
if use the ipex quant linear, no need to convert by @LRL-ModelCloud in #730
hf_select_quant_linear add device_map by @LRL-ModelCloud in #732
Add TorchQuantLinear by @ZX-ModelCloud in #735
Add QUANT_TYPE in qlinear by @jiqing-feng in #736
Replace error with warning for Intel CPU check by @CSY-ModelCloud in #737
Add BACKEND.AUTO_CPU by @LRL-ModelCloud in #739
Fix ipex linear check by @jiqing-feng in #741
fFx select quant linear by @jiqing-feng in #742
Now meta.quantizer value can be an array by @ZX-ModelCloud in #744
Receive checkpoint_format argument by @ZX-ModelCloud in #747
Modify hf convert gptq v2 to v1 format by @ZX-ModelCloud in #749
update score max negative delta by @CSY-ModelCloud in #748
[CI] max parallel jobs 10 by @CSY-ModelCloud in #751
hymba got high score by @CSY-ModelCloud in #752
hf_select_quant_linear() always set pack=True by @ZX-ModelCloud in #754
Refractor CudaQuantLinear to DynamicCudaQuantLinear by @ZX-ModelCloud in #759
Remove filename prefix on qlinear dir by @ZX-ModelCloud in #760
Replace Nvidia-smi with devicesmi by @CSY-ModelCloud in #761
Fix XPU training by @jiqing-feng in #763
Fix auto marlin kernel selection by @CSY-ModelCloud in #765
Add BaseQuantLinear SUPPORTS_TRAINING declaration by @LRL-ModelCloud in #766
Add Eval() api to support LM-Eval or EvalPlus benchmark harnesses by @CL-ModelCloud in #750
Fix validate_device by @LRL-ModelCloud in #769
Force BaseQuantLinear properties to be explicitly declared by all QuantLinears by @ZX-ModelCloud in #767
Convert str backend to enum backend by @LRL-ModelCloud in #772
Remove nested list in dict by @CSY-ModelCloud in #774
Fix training qlinear by @LRL-ModelCloud in #777
Check kernel by @CSY-ModelCloud in #764
BACKEND.AUTO if backend is None by @LRL-ModelCloud in #781
Fix lm_head quantize test by @CSY-ModelCloud in #784
Fix exllama doesn't support 8 bit by @CSY-ModelCloud in #790
Use set() to avoid calling torch twice by @CSY-ModelCloud in #791
Fix ipex cpu backend import error and fix too much logs by @jiqing-feng in #793
Eval API opt by @CL-ModelCloud in #794
Fixed ipex linear param check and logging once by @jiqing-feng in #795
Check device before sync by @LRL-ModelCloud in #796
Only AUTO will try other quant linears by @CSY-ModelCloud in #797
Add SUPPORTS_AUTO_PADDING property to QuantLinear by @LRL-ModelCloud in #799
Dynamic now support skipping modules/layers by @CSY-ModelCloud in #804
Fix module was skipped but still be looped by @CSY-ModelCloud in #806
Make Triton kernel auto-pad on features/group_size by @LRL-ModelCloud in #808

Full Changelog: v1.3.1...v1.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v1.4.0

What's Changed

Contributors