sync release with main @ v0.5.0.post1-99-g8720c92e#63

Merged

openshift-merge-bot[bot] merged 772 commits intoopendatahub-io:releasefrom dtrifiro:sync-release-with-main

Jun 21, 2024

+39,374-15,406

This pull request is big! We're only showing the most recent 250 commits

Commits on Jun 1, 2024

[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (vllm-project#5171 )
njhill
authored
[Build] Guard against older CUDA versions when building CUTLASS 3.x kernels (vllm-project#5168 )
tlrmchlsmth
authored
[CI/Build] CMakeLists: build all extensions' cmake targets at the same time (vllm-project#5034 )
dtrifiro
authored
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (vllm-project#5137 )
tlrmchlsmth
authored
[Kernel] Update Cutlass fp8 configs (vllm-project#5144 )

authored
[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py (vllm-project#5151 )
dashanji
authored
[Bugfix] Fix call to init_logger in openai server (vllm-project#4765 )
NadavShmayo
authored
[Feature][Kernel] Support bitsandbytes quantization and QLoRA (vllm-project#4776 )
chenqianfzh
authored
[Bugfix] Remove deprecated @abstractproperty (vllm-project#5174 )
zhuohan123
authored
[Bugfix]: Fix issues related to prefix caching example (vllm-project#5177 ) (vllm-project#5180 )
Delviet
authored
[BugFix] Prevent LLM.encode for non-generation Models (vllm-project#5184 )

robertgshaw2-redhat
and
mgoin
authored

Commits on Jun 3, 2024

Commits on Jun 4, 2024

Commits on Jun 5, 2024

Commits on Jun 7, 2024

Commits on Jun 8, 2024

Commits on Jun 10, 2024

Commits on Jun 11, 2024

Commits on Jun 12, 2024

Commits on Jun 13, 2024

Commits on Jun 14, 2024

Commits on Jun 15, 2024

Commits on Jun 17, 2024

Commits on Jun 18, 2024

Commits on Jun 19, 2024

Commits on Jun 20, 2024

Commits on Jun 21, 2024