[Doc] Improve installation signposting #12575

hmellor · 2025-01-30T11:14:18Z

Make device tab names more explicit
Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html
Add attention blocks to the intro of all devices that don't have pre-built wheels/images

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

github-actions · 2025-01-30T11:14:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337 · 2025-01-30T11:54:36Z

I will check later tonight, thanks.

DarkLight1337

Looks better, thanks for improving this! We can merge this after @mgoin also approves

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mgoin

Thank you very much, this is great!

@mgoin

commit 5d5071c Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Sat Feb 1 01:13:23 2025 +0000 reduce split kv amount Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 5fe1d1d Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Sat Feb 1 00:56:45 2025 +0000 format Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 0d66687 Author: Simon Mo <simon.mo@hey.com> Date: Fri Jan 31 16:39:19 2025 -0800 Update loader.py Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 5002734 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Sat Feb 1 00:14:14 2025 +0000 simplification Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit fac827f Merge: db2c583 44bbca7 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Sat Feb 1 00:09:36 2025 +0000 Merge remote-tracking branch 'origin/main' into mla-fp8 commit db2c583 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Sat Feb 1 00:06:10 2025 +0000 filter compressed tensor models better Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit e144da8 Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Fri Jan 31 18:41:35 2025 -0500 Update vllm/model_executor/model_loader/loader.py Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 1621381 Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Fri Jan 31 18:41:22 2025 -0500 Update vllm/model_executor/model_loader/loader.py Co-authored-by: Simon Mo <simon.mo@hey.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 9829fae Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 23:40:12 2025 +0000 misc Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 44bbca7 Author: Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Date: Fri Jan 31 17:38:48 2025 -0600 [Doc] int4 w4a16 example (vllm-project#12585) Based on a request by @mgoin , with @kylesayrs we have added an example doc for int4 w4a16 quantization, following the pre-existing int8 w8a8 quantization example and the example available in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16/llama3_example.py) FIX #n/a (no issue created) @kylesayrs and I have discussed a couple additional improvements for the quantization docs. We will revisit at a later date, possibly including: - A section for "choosing the correct quantization scheme/ compression technique" - Additional vision or audio calibration datasets --------- Signed-off-by: Brian Dellabetta <bdellabe@redhat.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> commit 60808bd Author: Harry Mellor <19981378+hmellor@users.noreply.github.com> Date: Fri Jan 31 23:38:35 2025 +0000 [Doc] Improve installation signposting (vllm-project#12575) - Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> commit fc54214 Author: Ryan Nguyen <96593302+xpbowler@users.noreply.github.com> Date: Fri Jan 31 18:37:30 2025 -0500 [Feature] Fix guided decoding blocking bitmask memcpy (vllm-project#12563) **[Guided decoding performance optimization]** Sending the guided decoding bitmask in xgrammar to the GPU (`self.token_bitmask.to(scores.device)`) is a blocking operation that prevents the CPU from pre-launching the sampler kernels. The CPU waits until decode is complete, then copies the bitmask over. This PR changes the operation to async via setting `non-blocking=True`. (Current) The CPU is blocked on a `cudaStreamSynchronize` and only pre-empts the sampling kernels after bitmask application. Below is the Nsys profile for one decode phase from Llama 3.1 8B. ![image](https://github.com/user-attachments/assets/8997eae1-b822-4f52-beb8-ef19a7c6b824) With the optimization, this is no longer the case: ![image](https://github.com/user-attachments/assets/6d5ea83f-f169-4f98-a8c1-41c719b3e1e7) --------- Signed-off-by: Ryan N <ryan.nguyen@centml.ai> commit eb5741a Author: Tyler Michael Smith <tyler@neuralmagic.com> Date: Fri Jan 31 18:29:11 2025 -0500 [Kernel][Quantization] Integrate block-quantized CUTLASS kernels for DeepSeekV3 (vllm-project#12587) Integrates the block-quantized kernels introduced in vllm-project#11868 for use in linear layers. Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> commit 145c2ff Author: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Date: Fri Jan 31 18:28:47 2025 -0500 [Bugfix] Revert MoE Triton Config Default (vllm-project#12629) SUMMARY: * previous PR for pulling in block configs also changed defaults (https://github.com/vllm-project/vllm/pull/11589/files) for FP8 * this broke L4 MoE since there was not enough SHM for the default configuration * this reverts the non-block example to the default Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> commit 415f194 Author: Kevin H. Luu <kevin@anyscale.com> Date: Fri Jan 31 13:39:36 2025 -0800 [release] Add input step to ask for Release version (vllm-project#12631) Instead of having to create a new build with release version put in as env var. commit 4251506 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 21:26:13 2025 +0000 fixes Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit c9d72cb Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 21:17:23 2025 +0000 more cleanup Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 3cdd2ce Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 21:16:42 2025 +0000 cleanup Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 89003c4 Author: Chen Zhang <zhangch99@outlook.com> Date: Sat Feb 1 05:13:04 2025 +0800 [v1][Bugfix] Add extra_keys to block_hash for prefix caching (vllm-project#12603) This pr adds extra key to block hash, to generate different hash value for two blocks with the same token string but different extra_keys in their parent blocks. For example, it can generate different hash value for the second block of the following two requests: ```python request1 = make_request( request_id=0, prompt_token_ids=[_ for _ in range(6)], mm_positions=[{ "offset": 0, "length": 3 }, { "offset": 3, "length": 3 }], mm_hashes=["hash1", "hash2"], ) request2 = make_request( request_id=1, prompt_token_ids=[_ for _ in range(6)], mm_positions=[{ "offset": 0, "length": 3 }, { "offset": 3, "length": 3 }], mm_hashes=["hash3", "hash2"], ) ``` --------- Signed-off-by: Chen Zhang <zhangch99@outlook.com> commit f51cbe0 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 21:04:22 2025 +0000 review comments Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 3d12a04 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 20:45:14 2025 +0000 working but messy Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 60bcef0 Author: Cody Yu <hao.yu.cody@gmail.com> Date: Fri Jan 31 12:30:46 2025 -0800 [Docs][V1] Prefix caching design (vllm-project#12598) - Create v1 design document section in docs. - Add prefix caching design doc. @WoosukKwon @ywang96 --------- Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> commit 847f883 Author: Cody Yu <hao.yu.cody@gmail.com> Date: Fri Jan 31 12:30:33 2025 -0800 [Git] Automatically sign-off commits (vllm-project#12595) It's very annoying when I forgot to add `-s` in `git commit` to sign-off, because I then need to `git rebase HEAD~1 --signoff` and `git push -f` to fix the DCO. This PR adds a hook to sign off commits automatically when `-s` is missing to solve this problem. The only change from the user side is now users have to install 2 hooks, so instead of just ``` pre-commit install ``` Now we need to ``` pre-commit install --hook-type pre-commit --hook-type commit-msg ``` Note that even if users still only install the pre-commit hook, they won't get any error in `git commit`. Just the sign-off hook won't run. cc @hmellor @youkaichao --------- Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> commit 325f679 Author: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Date: Fri Jan 31 15:06:39 2025 -0500 [BugFix] Fix Torch.Compile For DeepSeek (vllm-project#12594) Co-authored-by: simon-mo <xmo@berkeley.edu> commit 548ec44 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 19:13:22 2025 +0000 simon changes Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit a57cd3d Merge: 076cbe5 cabaf4e Author: simon-mo <simon.mo@hey.com> Date: Fri Jan 31 07:52:26 2025 +0000 Merge branch 'main' of github.com:vllm-project/vllm into mla-fp8 commit 076cbe5 Merge: 0ccbcce a1fc18c Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Thu Jan 30 23:31:41 2025 -0500 Merge branch 'main' into mla-fp8 commit 0ccbcce Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 04:29:17 2025 +0000 deepseek v3 support Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 645622c Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 03:08:36 2025 +0000 cleanup Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 2d61054 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 03:03:07 2025 +0000 cleanup Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit f2b2500 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 02:47:05 2025 +0000 Fix TP > 1 cuda graphs Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 433322b Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Fri Jan 31 02:26:11 2025 +0000 Revert "add cuda graph support" Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 31c34bf Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 23:06:09 2025 +0000 ci fix Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 54ba87d Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 21:23:09 2025 +0000 add cuda graph support Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 5afc1bf Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 20:58:53 2025 +0000 fix mypy Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit cfb2d26 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 19:42:36 2025 +0000 fix mypy Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 37e39f4 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 18:04:58 2025 +0000 fix failing test Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 0881475 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 17:18:55 2025 +0000 disable MLA for v3 for now Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 4a46014 Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Thu Jan 30 11:12:48 2025 -0500 Update vllm/attention/backends/mla/utils.py Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 09d814c Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 15:11:58 2025 +0000 review comments Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 8bdc14a Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 14:09:46 2025 +0000 review comments Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit d27826d Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Thu Jan 30 08:51:42 2025 -0500 Update vllm/config.py Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 7487429 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 04:00:26 2025 +0000 renaming for consistency Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 634eee6 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 03:52:59 2025 +0000 review comments Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 31b802c Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Wed Jan 29 22:51:37 2025 -0500 Update vllm/attention/backends/mla/utils.py Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 068e672 Author: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Date: Wed Jan 29 22:46:43 2025 -0500 Update utils.py Co-authored-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit f2cac91 Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 03:11:43 2025 +0000 more cleanups Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit c34e5ca Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 03:02:58 2025 +0000 fix VLLM_MLA_PERFORM_MATRIX_ABSORPTION=0 Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> commit 27ad92c Author: Lucas Wilkinson <lwilkinson@neuralmagic.com> Date: Thu Jan 30 02:29:40 2025 +0000 squashed commits Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>

- Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com>

- Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

- Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Srikanth Srinivas <srikanth@astrum.ai>

- Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

- Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Felix Marty <felmarty@amd.com>

- Make device tab names more explicit - Add comprehensive list of devices to https://docs.vllm.ai/en/latest/getting_started/installation/index.html - Add `attention` blocks to the intro of all devices that don't have pre-built wheels/images --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor added 7 commits January 30, 2025 10:54

Make tab names more explicit

7283dd8

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Add all device types to installation index

7c1930c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Move ROCm pre-built image text to correct section

d205e60

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Add recommended flow to GPU guides

2e3ac9c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Add recommended flow to CPU guides

6ae8fd3

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Add recommended flow to AI accelerator guides

1b4e88f

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Put all recommended flows in attention admonitions

900cb10

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mergify bot added the documentation Improvements or additions to documentation label Jan 30, 2025

hmellor requested a review from DarkLight1337 January 30, 2025 11:15

DarkLight1337 requested a review from mgoin January 30, 2025 11:54

DarkLight1337 approved these changes Jan 30, 2025

View reviewed changes

DarkLight1337 changed the title ~~Improve installation signposting~~ [Doc] Improve installation signposting Jan 30, 2025

Move IPEX tip into Requirements

e548c1b

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mgoin approved these changes Jan 30, 2025

View reviewed changes

simon-mo merged commit 60808bd into vllm-project:main Jan 31, 2025
16 of 18 checks passed

hmellor deleted the improve-installation-signposting branch February 1, 2025 01:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Improve installation signposting #12575

[Doc] Improve installation signposting #12575

hmellor commented Jan 30, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 30, 2025

DarkLight1337 commented Jan 30, 2025 •

edited

Loading

DarkLight1337 left a comment

mgoin left a comment

[Doc] Improve installation signposting #12575

[Doc] Improve installation signposting #12575

Conversation

hmellor commented Jan 30, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 30, 2025

DarkLight1337 commented Jan 30, 2025 • edited Loading

DarkLight1337 left a comment

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

hmellor commented Jan 30, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Jan 30, 2025 •

edited

Loading