GPTQModel v1.4.1
What's Changed
⚡ Added Qwen2-VL model support.
⚡ mse
quantization control exposed in QuantizeConfig
⚡ New GPTQModel.patch_hf()
and GPTQModel.patch_vllm()
monkey patch api to allow Transformers/Optimum/Peft to use GPTQModel while upstream PRs are pending.
⚡ New GPTQModel.patch_vllm()
monkey patch api to allow vLLM
to correctly load dynamic
/mixed gptq quantized models.
- Add warning for vllm/sglang when using dynamic feature by @CSY-ModelCloud in #810
- Update Eval() usage sample by @CL-ModelCloud in #819
- auto select best device by @CSY-ModelCloud in #822
- Fix error msg by @CSY-ModelCloud in #823
- allow pass meta_quantizer from save() by @CSY-ModelCloud in #824
- Quantconfig add mse field by @CL-ModelCloud in #825
- [MODEL] add qwen2_vl support by @LRL-ModelCloud in #826
- check cuda when there's only cuda device by @CSY-ModelCloud in #830
- Update lm-eval test by @CL-ModelCloud in #831
- add patch_vllm() by @ZX-ModelCloud in #829
- Monkey patch HF transformer/optimum/peft support by @CSY-ModelCloud in #818
- auto patch vllm by @CSY-ModelCloud in #837
- Fix lm-eval API BUG by @CL-ModelCloud in #838
- [FIX] dynamic get "desc_act" error by @ZX-ModelCloud in #841
- BaseModel add supports_desc_act by @ZX-ModelCloud in #842
- [FIX] should local import patch_vllm() by @ZX-ModelCloud in #844
- Mod vllm generate by @LRL-ModelCloud in #833
- fix patch_vllm by @LRL-ModelCloud in #850
Full Changelog: v1.4.0...v1.4.1