Skip to content

add a8w8 fp8 ck gemm tune support#1782

Merged
solinzby1 merged 10 commits intomainfrom
so/fp8_a8w8_ck
Jan 12, 2026
Merged

add a8w8 fp8 ck gemm tune support#1782
solinzby1 merged 10 commits intomainfrom
so/fp8_a8w8_ck

Conversation

@solinzby1
Copy link
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

@solinzby1 solinzby1 requested review from a team and Copilot January 7, 2026 07:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds FP8 (8-bit floating point) support to the CK GEMM tuning infrastructure, expanding beyond the existing INT8 quantization. The changes enable tuning and execution of GEMM operations with FP8 quantized inputs while maintaining the existing INT8 functionality.

Key changes:

  • Extended instance generation to support both INT8 and FP8 kernel variants during tuning
  • Added FP8 data generation and reference computation paths in the Python tuning script
  • Modified the CUDA kernel dispatcher to handle both INT8 and FP8 input types with appropriate template parameters

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
csrc/ck_gemm_a8w8/gen_instances.py Refactored instance generation to create separate I8 and F8 kernel instances for tuning with appropriate dtype template parameters
csrc/ck_gemm_a8w8/gemm_a8w8_tune.py Added quant_dtype parameter support, FP8 data generation logic, and updated reference computation to handle both I8 and FP8 quantization types
csrc/ck_gemm_a8w8/gemm_a8w8_tune.cu Updated kernel dispatcher to detect input dtype (I8 vs FP8) and dispatch to appropriate kernel templates with matching scale dtypes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@solinzby1 solinzby1 requested review from yadaish and yzhou103 January 7, 2026 11:16
@yadaish
Copy link
Contributor

yadaish commented Jan 8, 2026

LGTM

yadaish
yadaish previously approved these changes Jan 8, 2026
@yzhou103
Copy link
Contributor

LGTM

@yadaish yadaish self-requested a review January 12, 2026 06:01
@ROCm ROCm deleted a comment from Copilot AI Jan 12, 2026
@ROCm ROCm deleted a comment from Copilot AI Jan 12, 2026
@solinzby1 solinzby1 merged commit 2e39dbe into main Jan 12, 2026
17 checks passed
@solinzby1 solinzby1 deleted the so/fp8_a8w8_ck branch January 12, 2026 06:24
zhuyuhua-v pushed a commit that referenced this pull request Jan 14, 2026
* add a8w8 fp8 tune support

* add q_dtype_w to deal with different type and refine config csv file

---------

Co-authored-by: solin <bingzhou@amd.com>
Co-authored-by: yzhou103 <Ying.Zhou2@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants