Sync with upstream@v0.5.4-7-g9118217f#120

Closed

dtrifiro wants to merge 168 commits intoopendatahub-io:mainfrom dtrifiro:sync-with-0.5.4

+21,667-8,467

Commits on Jul 23, 2024

[Misc] Add ignored layers for fp8 quantization (vllm-project#6657 )
mgoin
authored
[Frontend] Add Usage data in each chunk for chat_serving. vllm-project#6540 (vllm-project#6652 )
yecohn
authored
[Model] Pipeline Parallel Support for DeepSeek v2 (vllm-project#6519 )
tjohnson31415
authored
Bump transformers version for Llama 3.1 hotfix and patch Chameleon (vllm-project#6690 )
ywang96
authored
[build] relax wheel size limit (vllm-project#6704 )
youkaichao
authored
[CI] Add smoke test for non-uniform AutoFP8 quantization (vllm-project#6702 )
mgoin
authored
[Bugfix] StatLoggers: cache spec decode metrics when they get collected. (vllm-project#6645 )
tdoublep
authored
[bitsandbytes]: support read bnb pre-quantized model (vllm-project#5753 )

thesues
and
mgoin
authored

Commits on Jul 24, 2024

Commits on Jul 25, 2024

Commits on Jul 26, 2024

Commits on Jul 27, 2024

Commits on Jul 28, 2024

[Misc] Pass cutlass_fp8_supported correctly in fbgemm_fp8 (vllm-project#6871 )
zeyugao
authored

Commits on Jul 29, 2024

Commits on Jul 30, 2024

Commits on Jul 31, 2024

Commits on Aug 1, 2024

Commits on Aug 2, 2024

Commits on Aug 3, 2024

Commits on Aug 4, 2024

Commits on Aug 5, 2024