v0.6.2
What's Changed
- feat: FP8 quantization support for AMD ROCm by @AlpinDale in #729
- feat: add experts_int8 support by @AlpinDale in #730
- chore: move update_flash_attn_metadata to attn backend by @AlpinDale in #731
- chore: register lora functions as torch ops by @AlpinDale in #732
- feat: dynamo support for ScalarType by @AlpinDale in #733
- fix: types in AQLM and GGUF for dynamo support by @AlpinDale in #736
- fix:
custom_ar
check by @AlpinDale in #737 - fix: clear engine ref in RPC server by @AlpinDale in #738
- fix: use nvml to get consistent device names by @AlpinDale in #739
- feat: add Exaone model support by @shing100 in #743
- fix: minor bug fixes & clean-ups by @AlpinDale in #744
- chore: refactor
MultiModalConfig
initialization and profiling by @AlpinDale in #745 - chore: various TPU fixes and optimizations by @AlpinDale in #746
- fix: metrics endpoint with RPC server by @AlpinDale in #747
- chore: refactor llama3 rope by @AlpinDale in #748
- feat: add XTC Sampling by @AlpinDale in #740
- ci: fix dep install using pnpm by @ahme-dev in #749
- ci: fix docs deployment by @ahme-dev in #750
- chore: re-enable custom token bans by @AlpinDale in #751
- feat: bring back dynatemp by @AlpinDale in #754
- feat: quant_llm support by @AlpinDale in #755
- fix: add pandas to requirements by @AlpinDale in #756
- docs: update readme and quant docs by @AlpinDale in #757
- ci: bump version to 0.6.2 by @AlpinDale in #758
New Contributors
Full Changelog: v0.6.1.post1...v0.6.2