GPTQModel v0.9.2
What's Changed
Added auto-padding of model in/out-features for exllama and exllama v2. Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepSeek V2-Lite.
- ✨ [FEATURE/FIX] Padding infeatures/outfeatures for exllama, exllama v2, and marlin by @Qubitium @LRL-ModelCloud in #98
- ✨ [REFRACTOR] remove use_cuda_fp16 argument by @ZX-ModelCloud in #97
- ✨ [REFRACTOR]
model.post_init
by @PZS-ModelCloud in #103 - ✨ [BUILD] Add UV PIP usage instructions information by @CL-ModelCloud in #114
- 👾 [FIX] DeepSeek-V2-Lite load by @LRL-ModelCloud in #112
- 👾 [FIX] Opt fc1/fc2 layer modules should not be quantized by @Qubitium in #118
New Contributors
- @CL-ModelCloud made their first contribution in #114
Full Changelog: v0.9.1...v0.9.2