Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Jun 13, 2025

Purpose

Fixes #19364

Test

python bench_fp8_gemm.py 

BF16 vs FP8 GEMMs:
    batch_size  torch-bf16  fp8-tensor-w-tensor-a  fp8-channel-w-token-a  fp8-tensor-w-tensor-a-noquant  fp8-channel-w-token-a-noquant
0          1.0    2.560670               4.320622               4.042926                       5.431362                       5.383086
1         16.0   39.288055              70.105174              66.377700                      88.413925                      89.528846
2         64.0  160.525909             304.590458             286.944317                     396.810053                     405.164967
3        128.0  308.701123             539.554228             510.844428                     703.054311                     702.508732
4        256.0  541.459009             772.972745             742.389709                     942.623998                     952.026312
5        512.0  646.868024             866.601296             810.128518                    1010.673914                    1007.868419
6       1024.0  651.920065            1078.126444            1021.105427                    1248.947844                    1242.909159
7       2048.0  706.403803            1118.760714            1057.808026                    1269.786307                    1270.055588
8       4096.0  730.363859            1098.713779            1051.117136                    1259.165539                    1249.199156
9       8192.0  695.302733            1155.644699            1106.260386                    1271.727617                    1311.712990
10     16384.0  722.155455            1183.444196            1121.000376                    1273.485919                    1289.554055
...
Benchmark finished!

Signed-off-by: yewentao256 <zhyanwentao@126.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@yewentao256 yewentao256 changed the title refactor benchmark script for fp8 & int8 [Benchmark] Refactor benchmark script for fp8 & int8 Jun 13, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yewentao256, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the benchmark scripts for FP8 and INT8 GEMM operations. The primary goal is to improve code organization and maintainability by centralizing configuration options and abstracting the setup logic for different quantization and execution modes into dedicated helper functions. This makes the benchmark code cleaner and easier to extend or modify in the future.

Highlights

  • Refactor Benchmark Logic: Introduced PROVIDER_CFGS dictionaries in both bench_fp8_gemm.py and bench_int8_gemm.py to centralize and configure different quantization and GEMM execution strategies (tensor/channel weight, token/tensor activation, with/without activation quant).
  • Abstracted Runner Creation: Added build_fp8_runner and build_int8_runner helper functions to encapsulate the logic for setting up the specific benchmark run based on the chosen configuration, significantly simplifying the main benchmark function.
  • Simplified Benchmark Function: Replaced large conditional blocks in the benchmark functions with calls to the new build_*_runner helpers, making the core benchmarking logic cleaner and easier to read.
  • Dynamic Provider List: Updated the triton.testing.Benchmark definition in both scripts to dynamically use the enabled providers from the PROVIDER_CFGS dictionary, making it easier to control which configurations are benchmarked.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors the benchmark scripts for fp8 and int8 quantization, significantly improving their structure and maintainability. Key changes include centralizing provider configurations into a PROVIDER_CFGS dictionary and using builder functions (build_fp8_runner, build_int8_runner) to dynamically create the benchmark execution logic. This replaces large if/elif blocks, making the code cleaner and easier to extend.

My review focuses on:

  • Maintainability: Suggesting minor refactorings to reduce code duplication within the new builder functions.
  • Consistency: Pointing out an inconsistency in SPDX headers between the two modified files and in how enabled providers are filtered.
  • Usability: Highlighting the removal of helpful command-line argument descriptions.
  • Code Hygiene: Noting an unused parameter in the builder functions.

Overall, the refactoring is a positive step. The suggestions aim to further polish the changes.

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

type=str,
default=["meta-llama/Llama-3.1-8B-Instruct"],
choices=[*WEIGHT_SHAPES.keys()],
help="List of models to benchmark",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the help here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name "models" already show the meaning and we can save some space for "help", but certainly I can add it back if you wish.

@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 14, 2025
@houseroad houseroad merged commit 3d330c4 into vllm-project:main Jun 15, 2025
54 checks passed
@yewentao256 yewentao256 deleted the wye-refactor-benchmark-script branch June 16, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Benchmark Script] Refactor benchmark script for bench_datatype_gemm

2 participants