GPTQModel v0.9.8
What's Changed
- Marlin end-to-end in/out feature padding for max model support
- Run quantized models (
FORMAT.GPTQ
) directly using fast vLLM backend! - Run quantized models (
FORMAT.GPTQ
) directly using fast SGLang backend!
- 🚀 🚀 [CORE] Marlin end-to-end in/out feature padding by @LRL-ModelCloud in #183 #192
- 🚀 🚀 [CORE] Add vLLM Backend for FORMAT.GPTQ by @PZS-ModelCloud in #190
- 🚀 🚀 [CORE] Add SGLang Backend by @PZS-ModelCloud in #191
- 🚀 [CORE] Use Triton v2 to pack gptq/gptqv2 formats by @LRL-ModelCloud in #202
- ✨ [CLEANUP] remove triton warmup by @Qubitium in #200
- 👾 [FIX] 8bit choosing wrong packer by @Qubitium in #199
- ✨ [CI] [CLEANUP] Improve Unit Tests by CSY, PSY, and ZYC
- ✨ [DOC] Consolidate Examples by ZYC in #225
Full Changelog: v0.9.7...v0.9.8