Sync with upstream@v0.5.4-7-g9118217f#120
Closed
dtrifiro wants to merge 168 commits intoopendatahub-io:mainfrom dtrifiro:sync-with-0.5.4
+21,667-8,467
Commits
Commits on Jul 23, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 24, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 25, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix] Fix empty (nullptr) channelwise scales when loading wNa16 using compressed tensors (vllm-project#6798)
authored- authored
- authored
- authored
Commits on Jul 26, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (vllm-project#6125)
authored- authored
- authored
Commits on Jul 27, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 28, 2024
Commits on Jul 29, 2024
- authored
- authored
- authored
- authored
- authored
Commits on Jul 30, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Jul 31, 2024
[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (vllm-project#6964)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 1, 2024
- authored
- authored
- authored
- authored
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (vllm-project#6954)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 2, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 3, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 4, 2024
- authored
- authored
[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (vllm-project#7105)
authored- authored
- authored
Commits on Aug 5, 2024
- authored
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (vllm-project#6963)
authored- authored
- authored
- committed
- committed