Skip to content

Conversation

pyjhzwh
Copy link
Contributor

@pyjhzwh pyjhzwh commented Oct 2, 2025

Summary:
Use Native NVFP4 matmul to mimic MX4 matmul.

global_scale = 1.0
fp32_local_scale = local_amax / 4.0 * global_scale
nvfp4_local_scale = fp8_e4m3(fp32_to_e8m0(fp32_local_scale))

Verify Correctness

Output from the buck2 command in test plan

I1001 103514.172 numerics_bench.py:310] Numeric metrics for mx_quant mx4 symm,e8m0,pow2max,even,none,e2m1,nearest,1x16,0 use_triton=True
I1001 103514.173 numerics_bench.py:115] runtime: 1.221 ms.
I1001 103514.173 numerics_bench.py:116] TFLOPS: 379.941.
I1001 103514.173 numerics_bench.py:117] output_abs_err_bf16: 0.013.
I1001 103514.173 numerics_bench.py:118] output_rel_err_bf16: 1.672.
I1001 103514.173 numerics_bench.py:119] output_sqnr_bf16: 16.094 dB.
I1001 103514.173 numerics_bench.py:121] output_abs_err_mx4: 0.000.
I1001 103514.174 numerics_bench.py:124] output_rel_err_mx4: 0.000.
I1001 103517.503 numerics_bench.py:310] Numeric metrics for native_mx4_as_nvfp4 mx4 symm,e8m0,pow2max,even,none,e2m1,nearest,1x16,0
I1001 103517.503 numerics_bench.py:115] runtime: 0.317 ms.
I1001 103517.504 numerics_bench.py:116] TFLOPS: 1463.532.
I1001 103517.504 numerics_bench.py:117] output_abs_err_bf16: 0.013.
I1001 103517.504 numerics_bench.py:118] output_rel_err_bf16: 1.672.
I1001 103517.504 numerics_bench.py:119] output_sqnr_bf16: 16.094 dB.
I1001 103517.504 numerics_bench.py:121] output_abs_err_mx4: 0.000.
I1001 103517.504 numerics_bench.py:124] output_rel_err_mx4: 0.000.

Also verify the outputs match at bit-level by comparing the outputs in bento N8236690

Differential Revision: D83677495

Copy link

netlify bot commented Oct 2, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 97ec628
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68e041f2b55300000866e6a6
😎 Deploy Preview https://deploy-preview-4970--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

meta-codesync bot commented Oct 2, 2025

@pyjhzwh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83677495.

Summary:

X-link: facebookresearch/FBGEMM#1987

Use Native NVFP4 matmul to mimic MX4 matmul.
```
global_scale = 1.0
fp32_local_scale = local_amax / 4.0 * global_scale
nvfp4_local_scale = fp8_e4m3(fp32_to_e8m0(fp32_local_scale))
```

**Verify Correctness**

Output from the buck2 command in test plan
> I1001 103514.172 numerics_bench.py:310] Numeric metrics for mx_quant mx4 symm,e8m0,pow2max,even,none,e2m1,nearest,1x16,0 use_triton=True
I1001 103514.173 numerics_bench.py:115] runtime: 1.221 ms.
I1001 103514.173 numerics_bench.py:116] TFLOPS: 379.941.
I1001 103514.173 numerics_bench.py:117] output_abs_err_bf16:  0.013.
I1001 103514.173 numerics_bench.py:118] output_rel_err_bf16:  1.672.
I1001 103514.173 numerics_bench.py:119] output_sqnr_bf16: 16.094 dB.
I1001 103514.173 numerics_bench.py:121] output_abs_err_mx4:  0.000.
I1001 103514.174 numerics_bench.py:124] output_rel_err_mx4:  0.000.
I1001 103517.503 numerics_bench.py:310] Numeric metrics for native_mx4_as_nvfp4 mx4 symm,e8m0,pow2max,even,none,e2m1,nearest,1x16,0
I1001 103517.503 numerics_bench.py:115] runtime: 0.317 ms.
I1001 103517.504 numerics_bench.py:116] TFLOPS: 1463.532.
I1001 103517.504 numerics_bench.py:117] output_abs_err_bf16:  0.013.
I1001 103517.504 numerics_bench.py:118] output_rel_err_bf16:  1.672.
I1001 103517.504 numerics_bench.py:119] output_sqnr_bf16: 16.094 dB.
I1001 103517.504 numerics_bench.py:121] output_abs_err_mx4:  0.000.
I1001 103517.504 numerics_bench.py:124] output_rel_err_mx4:  0.000.


Also verify the outputs match at bit-level by comparing the outputs in bento N8236690

Reviewed By: ghjeong12

Differential Revision: D83677495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant