Skip to content

Conversation

@xiao-llm
Copy link
Contributor

@xiao-llm xiao-llm commented Aug 5, 2025

Purpose

Support fp8 mfma instruction with per wrap dynamic quantization for Query to improve performance, which reduces fp8 to fp16 data type conversion cost and improve mfma throughput for MI300x or later accelerators.

Test Plan

  • Unit testing:
    export VLLM_ROCM_FP8_MFMA_PAGE_ATTN=1
    pytest -s tests/kernels/attention/test_attention.py
  • Benchmark: lm-eval-harness
  • LLM model: Meta-Llama-3.1-8B-Instruct
  • Dataset: wikitext and gsm8k

Sample script:
export CUDA_VISIBLE_DEVICES=0,1,2,3
export MODEL_DIR=Meta-Llama-3.1-8B-Instruct
export VLLM_USE_V1=0

lm_eval --model vllm
--model_args pretrained=$MODEL_DIR
--tasks gsm8k
--trust_remote_code
--batch_size 8

Test Result

  • GSM8K-Llama-3.1-8B
image
  • Wikitext-Llama-3.3-70b
image
  • Unitest
image

(Optional) Documentation Update

@mergify
Copy link

mergify bot commented Aug 5, 2025

⚠️ The sha of the head commit of this PR conflicts with #22221. Mergify cannot evaluate rules on this PR. ⚠️

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces updates for FP8 paged attention on ROCm. I've identified two critical compilation errors that need to be addressed. The first is a stray #endif directive that would break the build. The second is a variable scoping issue within a loop due to incorrect placement of preprocessor directives, which would also cause a compilation failure. Addressing these issues will ensure the code compiles and functions as intended.

@mergify mergify bot added the rocm Related to AMD ROCm label Aug 5, 2025
@github-actions
Copy link

github-actions bot commented Aug 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Aug 6, 2025
@xiao-llm xiao-llm marked this pull request as ready for review August 6, 2025 17:33
@BowenBao
Copy link
Contributor

BowenBao commented Aug 8, 2025

cc @gshtras for review and comments

@mergify mergify bot added the performance Performance-related issues label Aug 13, 2025
@xiao-llm
Copy link
Contributor Author

@tlrmchlsmth Can you please help me review this PR?

Copy link
Collaborator

@gshtras gshtras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid job overall.
Could we have unit tests to cover the new fp8 path? Extend the additional rocm attention test.
Another major point is that we need a way to actually enable this, through command line/env/heuristic/etc., and not leave as a dead code.

@xiao-llm xiao-llm force-pushed the fp8_paged_attention_update branch from c851085 to 94018d1 Compare August 13, 2025 18:02
@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 13, 2025
@xiao-llm
Copy link
Contributor Author

Solid job overall. Could we have unit tests to cover the new fp8 path? Extend the additional rocm attention test. Another major point is that we need a way to actually enable this, through command line/env/heuristic/etc., and not leave as a dead code.
I can add env setting like: USE_FP8_MFMA, what do you think?

@mergify
Copy link

mergify bot commented Aug 26, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xiao-llm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 26, 2025
@xiao-llm xiao-llm force-pushed the fp8_paged_attention_update branch from ae384dc to 4c0c4ed Compare August 26, 2025 23:26
@mergify mergify bot removed the needs-rebase label Aug 27, 2025
@gshtras gshtras removed deepseek Related to DeepSeek models gpt-oss Related to GPT-OSS models labels Sep 8, 2025
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
@gshtras gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025
@gshtras gshtras enabled auto-merge (squash) September 10, 2025 21:59
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
@mergify
Copy link

mergify bot commented Sep 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xiao-llm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 11, 2025
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
auto-merge was automatically disabled September 14, 2025 20:41

Head branch was pushed to by a user without write access

@mergify mergify bot removed the needs-rebase label Sep 14, 2025
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
@gshtras gshtras merged commit 01413e0 into vllm-project:main Sep 15, 2025
80 checks passed
tlrmchlsmth pushed a commit to tlrmchlsmth/vllm that referenced this pull request Sep 15, 2025
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com>
Co-authored-by: Xiao Yu <xiao.yu@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com>
Co-authored-by: Xiao Yu <xiao.yu@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com>
Co-authored-by: Xiao Yu <xiao.yu@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com>
Co-authored-by: Xiao Yu <xiao.yu@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com>
Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com>
Co-authored-by: Xiao Yu <xiao.yu@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants