updating vllm modules to have upstream triton attention #17

bringlein · 2025-03-21T19:59:14Z

Updating to be able to use vllm-project/vllm#14071.

Vllm and vllm benchmarking runs on A100 and MI250:

# inside docker
VLLM_USE_V1=1 VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve /models/llama3.1-8b/instruct/ --disable-log-requests

# new shell inside same container
python3 /scripts/bench_vllm_user_range.py llama3.1-8b/instruct ibm_triton_attn

Please note this only works if the container is build with make dev or make rocm.

Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>

jvlunteren

Looks good to me.

bringlein added 2 commits March 21, 2025 11:05

updating vllm module to have upstream triton attention

d710c98

Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>

updating rocm vllm, updating rocm Dockerfile

57824ce

Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>

bringlein requested review from jvlunteren and tdoublep March 21, 2025 19:59

jvlunteren approved these changes Mar 24, 2025

View reviewed changes

bringlein merged commit d11132b into main Mar 24, 2025
1 check passed

bringlein deleted the ngl_update_vllm_03-21 branch March 24, 2025 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

updating vllm modules to have upstream triton attention #17

updating vllm modules to have upstream triton attention #17

Uh oh!

bringlein commented Mar 21, 2025

Uh oh!

jvlunteren left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

updating vllm modules to have upstream triton attention #17

updating vllm modules to have upstream triton attention #17

Uh oh!

Conversation

bringlein commented Mar 21, 2025

Uh oh!

jvlunteren left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants