Support FP8 grouped GEMM with cudagraph #3373

Summary: X-link: facebookresearch/FBGEMM#460 Refactor FP8 grouped GEMM to extract grouped GEMM arguments and configurations ahead of the grouped gemm kernel, such that those can be reused for another cuda kernel argument setup on device Differential Revision: D65548954

Summary: X-link: facebookresearch/FBGEMM#463 Enable cudagraph support for FP8 grouped GEMM It's quite challenging to make cudagraph support to handle more complicated kernel arguments with various pointer array and memory alignment, compared to cudagraph support in CK grouped GEMM in D65634843 Differential Revision: D65864972

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support FP8 grouped GEMM with cudagraph #3373

Support FP8 grouped GEMM with cudagraph #3373

Commits on Nov 14, 2024

Support FP8 grouped GEMM with cudagraph #3373

Are you sure you want to change the base?

Support FP8 grouped GEMM with cudagraph #3373

Commits on Nov 14, 2024