[Benchmark] Refactor benchmark script for fp8 & int8 #19627

yewentao256 · 2025-06-13T21:44:53Z

Purpose

Fixes #19364

Test

python bench_fp8_gemm.py 

BF16 vs FP8 GEMMs:
    batch_size  torch-bf16  fp8-tensor-w-tensor-a  fp8-channel-w-token-a  fp8-tensor-w-tensor-a-noquant  fp8-channel-w-token-a-noquant
0          1.0    2.560670               4.320622               4.042926                       5.431362                       5.383086
1         16.0   39.288055              70.105174              66.377700                      88.413925                      89.528846
2         64.0  160.525909             304.590458             286.944317                     396.810053                     405.164967
3        128.0  308.701123             539.554228             510.844428                     703.054311                     702.508732
4        256.0  541.459009             772.972745             742.389709                     942.623998                     952.026312
5        512.0  646.868024             866.601296             810.128518                    1010.673914                    1007.868419
6       1024.0  651.920065            1078.126444            1021.105427                    1248.947844                    1242.909159
7       2048.0  706.403803            1118.760714            1057.808026                    1269.786307                    1270.055588
8       4096.0  730.363859            1098.713779            1051.117136                    1259.165539                    1249.199156
9       8192.0  695.302733            1155.644699            1106.260386                    1271.727617                    1311.712990
10     16384.0  722.155455            1183.444196            1121.000376                    1273.485919                    1289.554055
...
Benchmark finished!

Signed-off-by: yewentao256 <zhyanwentao@126.com>

github-actions · 2025-06-13T21:45:00Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @yewentao256, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the benchmark scripts for FP8 and INT8 GEMM operations. The primary goal is to improve code organization and maintainability by centralizing configuration options and abstracting the setup logic for different quantization and execution modes into dedicated helper functions. This makes the benchmark code cleaner and easier to extend or modify in the future.

Highlights

Refactor Benchmark Logic: Introduced PROVIDER_CFGS dictionaries in both bench_fp8_gemm.py and bench_int8_gemm.py to centralize and configure different quantization and GEMM execution strategies (tensor/channel weight, token/tensor activation, with/without activation quant).
Abstracted Runner Creation: Added build_fp8_runner and build_int8_runner helper functions to encapsulate the logic for setting up the specific benchmark run based on the chosen configuration, significantly simplifying the main benchmark function.
Simplified Benchmark Function: Replaced large conditional blocks in the benchmark functions with calls to the new build_*_runner helpers, making the core benchmarking logic cleaner and easier to read.
Dynamic Provider List: Updated the triton.testing.Benchmark definition in both scripts to dynamically use the enabled providers from the PROVIDER_CFGS dictionary, making it easier to control which configurations are benchmarked.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request refactors the benchmark scripts for fp8 and int8 quantization, significantly improving their structure and maintainability. Key changes include centralizing provider configurations into a PROVIDER_CFGS dictionary and using builder functions (build_fp8_runner, build_int8_runner) to dynamically create the benchmark execution logic. This replaces large if/elif blocks, making the code cleaner and easier to extend.

My review focuses on:

Maintainability: Suggesting minor refactorings to reduce code duplication within the new builder functions.
Consistency: Pointing out an inconsistency in SPDX headers between the two modified files and in how enabled providers are filtered.
Usability: Highlighting the removal of helpful command-line argument descriptions.
Code Hygiene: Noting an unused parameter in the builder functions.

Overall, the refactoring is a positive step. The suggestions aim to further polish the changes.

benchmarks/kernels/bench_fp8_gemm.py

benchmarks/kernels/bench_int8_gemm.py

Signed-off-by: yewentao256 <zhyanwentao@126.com>

houseroad

Looks good.

houseroad · 2025-06-14T13:14:37Z

benchmarks/kernels/bench_fp8_gemm.py

        type=str,
        default=["meta-llama/Llama-3.1-8B-Instruct"],
-        choices=[*WEIGHT_SHAPES.keys()],
-        help="List of models to benchmark",


why remove the help here?

I think the name "models" already show the meaning and we can save some space for "help", but certainly I can add it back if you wish.

refactor benchmark script for fp8 & int8

75d809a

Signed-off-by: yewentao256 <zhyanwentao@126.com>

yewentao256 changed the title ~~refactor benchmark script for fp8 & int8~~ [Benchmark] Refactor benchmark script for fp8 & int8 Jun 13, 2025

gemini-code-assist bot reviewed Jun 13, 2025

View reviewed changes

benchmarks/kernels/bench_fp8_gemm.py Outdated Show resolved Hide resolved

benchmarks/kernels/bench_fp8_gemm.py Show resolved Hide resolved

benchmarks/kernels/bench_int8_gemm.py Outdated Show resolved Hide resolved

benchmarks/kernels/bench_int8_gemm.py Show resolved Hide resolved

remove unused N

a856418

Signed-off-by: yewentao256 <zhyanwentao@126.com>

houseroad approved these changes Jun 14, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 14, 2025

houseroad merged commit 3d330c4 into vllm-project:main Jun 15, 2025
54 checks passed

yewentao256 deleted the wye-refactor-benchmark-script branch June 16, 2025 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Benchmark] Refactor benchmark script for fp8 & int8 #19627

[Benchmark] Refactor benchmark script for fp8 & int8 #19627

Uh oh!

yewentao256 commented Jun 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

houseroad left a comment

Uh oh!

houseroad Jun 14, 2025

Uh oh!

yewentao256 Jun 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Benchmark] Refactor benchmark script for fp8 & int8 #19627

[Benchmark] Refactor benchmark script for fp8 & int8 #19627

Uh oh!

Conversation

yewentao256 commented Jun 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

yewentao256 Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yewentao256 commented Jun 13, 2025 •

edited by github-actions bot

Loading