sync release with main @ v0.5.0.post1-99-g8720c92e#63
Merged
openshift-merge-bot[bot] merged 772 commits intoopendatahub-io:releasefrom dtrifiro:sync-release-with-mainJun 21, 2024
+39,374-15,406
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on May 31, 2024
Commits on Jun 1, 2024
- authored
- authored
- authored
- authored
[Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py (vllm-project#5151)
authored- authored
- authored
- authored
- authored
Commits on Jun 2, 2024
- authored
- authored
- authored
- authored
Commits on Jun 3, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 4, 2024
- authored
- authored
- authored
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (vllm-project#5210)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 5, 2024
- authored
- authored
- authored
[Frontend] OpenAI API server: Add
add_special_tokens
to ChatCompletionRequest (default False) (vllm-project#5278)authored- authored
- authored
[Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size (vllm-project#5157)
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix][Frontend/Core] Don't log exception when AsyncLLMEngine gracefully shuts down. (vllm-project#5290)
authored- authored
- authored
- authored
Commits on Jun 6, 2024
- authored
- authored
- authored
Commits on Jun 7, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 8, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 9, 2024
- authored
- authored
- authored
- authored
Commits on Jun 10, 2024
- authored
- authored
- authored
[Feature][Frontend]: Continued
stream_options
implementation also in CompletionRequest (vllm-project#5319)authored- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 11, 2024
- authored
- authored
- authored
[Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (vllm-project#5312)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 12, 2024
[CI/Build] Add
is_quant_method_supported
to control quantization test configurations (vllm-project#5253)authoredRevert "[CI/Build] Add
is_quant_method_supported
to control quantization test configurations" (vllm-project#5463)authored- authored
- authored
- authored
[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests (vllm-project#5464)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 13, 2024
- authored
- authored
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations (vllm-project#5466)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 14, 2024
- authored
- authored
[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with
perf-benchmarks
label (vllm-project#5073)- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 15, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 17, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jun 18, 2024
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (vllm-project#5131)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization (vllm-project#5542)
authored- authored
- authored
[Bugfix] Fix for inconsistent behaviour related to sampling and repetition penalties (vllm-project#5639)
authored- authored
- authored
Commits on Jun 19, 2024
- authored
- authored
- authored
[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices (vllm-project#5641)
authored- authored
- authored
- authored
- authored
- authored
- authored
[Frontend][Bugfix] Fix preemption_mode -> preemption-mode for CLI arg in arg_utils.py (vllm-project#5688)
authored- authored
- authored
- authored
[Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes (vllm-project#5650)
authored- authored
Commits on Jun 20, 2024
- authored
- authored
- authored
- authored
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (vllm-project#5718)
authored
Commits on Jun 21, 2024
- authored
- authored
- authored
- authored
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed