Sync release with main for RHOAI 2.12#110
Merged
dtrifiro merged 393 commits intoreleasefrom sync-release-with-mainJul 26, 2024
+42,884-11,710
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on Jul 4, 2024
Commits on Jul 5, 2024
[VLM] Improve consistency between feature size calculation and dummy data for profiling (vllm-project#6146)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 7, 2024
Commits on Jul 8, 2024
- authored
- authored
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (vllm-project#4888)
- authored
- authored
- authored
Commits on Jul 9, 2024
[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability (vllm-project#6216)
authored- authored
- authored
- authored
- authored
Commits on Jul 10, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authoredBenjamin Muskalla
- authored
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (vllm-project#5765)
authored- authored
Commits on Jul 11, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 12, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 13, 2024
- authored
- authored
- authored
- authored
- authored
Commits on Jul 14, 2024
- authored
- authored
- authored
- authored
Commits on Jul 15, 2024
- authored
- authored
[Bugfix] Benchmark serving script used global parameter 'args' in function 'sample_random_requests' (vllm-project#6428)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (vllm-project#6409)
authored- authored
- authored
- authored
- authored
Commits on Jul 16, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 17, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 18, 2024
- authored
- authored
- authored
- authored
[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash (vllm-project#6501)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 19, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 20, 2024
- authored
- authored
- authored
- authored
- authored
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (vllm-project#6543)
authored- authored
- authored
- authored
Commits on Jul 21, 2024
[Spec Decode] Disable Log Prob serialization to CPU for spec decoding for both draft and target models. (vllm-project#6485)
authored- authored
- authored
- authored
Commits on Jul 22, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 23, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed
- committed