Release v0.6.2 · PygmalionAI/aphrodite-engine

What's Changed

feat: FP8 quantization support for AMD ROCm by @AlpinDale in #729
feat: add experts_int8 support by @AlpinDale in #730
chore: move update_flash_attn_metadata to attn backend by @AlpinDale in #731
chore: register lora functions as torch ops by @AlpinDale in #732
feat: dynamo support for ScalarType by @AlpinDale in #733
fix: types in AQLM and GGUF for dynamo support by @AlpinDale in #736
fix: custom_ar check by @AlpinDale in #737
fix: clear engine ref in RPC server by @AlpinDale in #738
fix: use nvml to get consistent device names by @AlpinDale in #739
feat: add Exaone model support by @shing100 in #743
fix: minor bug fixes & clean-ups by @AlpinDale in #744
chore: refactor MultiModalConfig initialization and profiling by @AlpinDale in #745
chore: various TPU fixes and optimizations by @AlpinDale in #746
fix: metrics endpoint with RPC server by @AlpinDale in #747
chore: refactor llama3 rope by @AlpinDale in #748
feat: add XTC Sampling by @AlpinDale in #740
ci: fix dep install using pnpm by @ahme-dev in #749
ci: fix docs deployment by @ahme-dev in #750
chore: re-enable custom token bans by @AlpinDale in #751
feat: bring back dynatemp by @AlpinDale in #754
feat: quant_llm support by @AlpinDale in #755
fix: add pandas to requirements by @AlpinDale in #756
docs: update readme and quant docs by @AlpinDale in #757
ci: bump version to 0.6.2 by @AlpinDale in #758

Full Changelog: v0.6.1.post1...v0.6.2