Skip to content

GPTQModel v1.4.0

Compare
Choose a tag to compare
@Qubitium Qubitium released this 10 Dec 15:35
· 155 commits to main since this release
360a8e6

What's Changed

EvalPlus harness integration merged upstream. We now support both lm-eval and EvalPlus.
⚡ Added pure torch Torch kernel.
⚡ Refactored Cuda kernel to be DynamicCuda kernel.
Triton kernel now auto-padded for max model support.
Dynamic quantization now supports both positive +::default, and -: negative matching which allows matched modules to be skipped entirely for quantization.
⚡Added auto-kernel fallback for unsupported kernel/module pairs.
🐛 Fixed auto-Marlin kernel selection.
🗑 Deprecated the saving of Marlin weight format. Marlin allows auto conversion of gptq format to Marlin during runtime. gptq format allows max kernel flexibility including Marlin kernel support.

Lots of internal refractor and cleanup in-preparation for transformers/optimum/peft upstream PR merge.

Full Changelog: v1.3.1...v1.4.0