Release GPTQModel v0.9.2 · ModelCloud/GPTQModel

What's Changed

Added auto-padding of model in/out-features for exllama and exllama v2. Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.

✨ [FEATURE/FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin by @Qubitium @LRL-ModelCloud in #98
✨ [REFRACTOR] remove use_cuda_fp16 argument by @ZX-ModelCloud in #97
✨ [REFRACTOR] model.post_init by @PZS-ModelCloud in #103
✨ [BUILD] Add UV PIP usage instructions information by @CL-ModelCloud in #114
👾 [FIX] DeepSeek-V2-Lite load by @LRL-ModelCloud in #112
👾 [FIX] Opt fc1/fc2 layer modules should not be quantized by @Qubitium in #118

New Contributors

@CL-ModelCloud made their first contribution in #114

Full Changelog: v0.9.1...v0.9.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v0.9.2

What's Changed

New Contributors

Contributors