Skip to content

GPTQModel v0.9.8

Compare
Choose a tag to compare
@Qubitium Qubitium released this 13 Jul 12:55
· 235 commits to main since this release
0d263f3

What's Changed

  1. Marlin end-to-end in/out feature padding for max model support
  2. Run quantized models (FORMAT.GPTQ) directly using fast vLLM backend!
  3. Run quantized models (FORMAT.GPTQ) directly using fast SGLang backend!

Full Changelog: v0.9.7...v0.9.8