Skip to content

Conversation

@bringlein
Copy link
Collaborator

Updating to be able to use vllm-project/vllm#14071.

Vllm and vllm benchmarking runs on A100 and MI250:

# inside docker
VLLM_USE_V1=1 VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 vllm serve /models/llama3.1-8b/instruct/ --disable-log-requests

# new shell inside same container
python3 /scripts/bench_vllm_user_range.py llama3.1-8b/instruct ibm_triton_attn

Please note this only works if the container is build with make dev or make rocm.

Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Copy link
Collaborator

@jvlunteren jvlunteren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@bringlein bringlein merged commit d11132b into main Mar 24, 2025
1 check passed
@bringlein bringlein deleted the ngl_update_vllm_03-21 branch March 24, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants