Added support of NANOO fp8. #3993

wenchenvincent · 2024-06-13T02:25:49Z

What does this PR do?

This PR adds support of fp8 dot op for NANOO fp8 data formats (an alternative genre to the OCP fp8 data formats, which is used by NVIDIA GPU).

There are several different genres of fp8 formats used by different HW vendors. Two popular genres include

OCP fp8, which is used natively on NVIDIA H100
NANOO fp8, which is used natively on AMD MI300 and Graphcore HW.

These two genres of fp8 formats work very similarly. This PR is to enable support of NANOO fp8 as it is also now supported in JAX and XLA. It would enable usage of fp8 dot op on AMD MI300 GPU.

References:

OCP fp8 paper: https://arxiv.org/abs/2209.05433
NANOO fp8 paper: https://arxiv.org/abs/2206.02915
JAX PR: [ROCm] Add float8_e4m3fnuz and float8_e5m2fnuz support for Rocm jax-ml/jax#21376
XLA PR: Fp8 matmul support on AMD MI300 openxla/xla#9531

wenchenvincent · 2024-06-13T02:34:42Z

@levskaya I noticed that you have reviewed several PRs regarding fp8. Could you take a look at this one?

wenchenvincent · 2024-06-20T01:38:13Z

@levskaya Could you kindly serve as the reviewer for this PR?

levskaya

Sorry for the delay!

Looks OK, but a few requests to simplify class configuration and to not break existing names in public API.

Try resubmitting for tests after fixing that, the failure before was from a transient unrelated breakage.

flax/linen/__init__.py

flax/linen/fp8_ops.py

wenchenvincent · 2024-06-24T15:21:56Z

@levskaya Thanks for the review. I have updated the PR to address the concerns. Could you take a look at the updates?

levskaya

Thanks for the fixes! We may need to do some tiny rebasing of simple things as the codebase just migrated to a python minver of 3.10.

wenchenvincent · 2024-06-28T04:58:44Z

Thanks for the fixes! We may need to do some tiny rebasing of simple things as the codebase just migrated to a python minver of 3.10.

Thanks! Do you need me to rebase it to the tip of the tree?

levskaya · 2024-06-28T18:58:54Z

Yes to tip as of today should have the 3.10 minver updates. Also, I'm seeing this failure in the tests:

FAILED tests/linen/linen_test.py::Fp8Test::test_fp8_meta_dtype0 - TypeError: missing a required argument: 'amax_history'
FAILED tests/linen/linen_test.py::Fp8Test::test_fp8_meta_dtype1 - TypeError: missing a required argument: 'amax_history'

could you fix that?

There are several different genres of fp8 formats used by different HW vendors. Two popular genres include - OCP fp8, which is used natively on NVIDIA H100 - NANOO fp8, which is used natively on AMD MI300 and Graphcore HW. These two genres of fp8 formats work very similarly. This PR is to enable support of NANOO fp8 as it is also now supported in JAX and XLA. References: - OCP fp8 paper: https://arxiv.org/abs/2209.05433 - NANOO fp8 paper: https://arxiv.org/abs/2206.02915 - JAX PR: jax-ml/jax#21376 - XLA PR: openxla/xla#9531

Fp8DotGeneralOp API.

wenchenvincent · 2024-06-28T20:06:32Z

Yes to tip as of today should have the 3.10 minver updates. Also, I'm seeing this failure in the tests:
FAILED tests/linen/linen_test.py::Fp8Test::test_fp8_meta_dtype0 - TypeError: missing a required argument: 'amax_history'
FAILED tests/linen/linen_test.py::Fp8Test::test_fp8_meta_dtype1 - TypeError: missing a required argument: 'amax_history'
could you fix that?

Sorry I missed this test.

I just rebased and fixed this test.

codecov-commenter · 2024-06-28T22:50:59Z

Codecov Report

Attention: Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (31adb00) to head (a6f52ae).
Report is 46 commits behind head on main.

Files	Patch %	Lines
flax/linen/fp8_ops.py	0.00%	16 Missing ⚠️
flax/linen/__init__.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##            main   #3993    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files        106     107     +1     
  Lines      13582   13767   +185     
======================================
- Misses     13582   13767   +185

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

levskaya

Hey sorry, after importing, I just noticed two more things that need to be fixed.

tests/linen/linen_test.py

flax/linen/fp8_ops.py

… tests Imported from GitHub PR #16938 This PR adds support for NANOO FP8 data format in the collaborative communication unit tests. - For the context on OCP FP8 and NANOO FP8, please refer to this comment: google/flax#3993 (comment) - The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats: #10488 Copybara import of the project: -- 0fc74cc by Wen Chen <Wen.Chen@amd.com>: [AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz. -- d247af5 by scxfjiang <sc.xfjiang@gmail.com>: refactor tests for collective comm ops -- 6f8c418 by scxfjiang <sc.xfjiang@gmail.com>: rafactor collective comm e2e tests -- 8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>: update: replace str -- 338d3af by scxfjiang <sc.xfjiang@gmail.com>: get rid of macros Merging this change closes #16938 FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af PiperOrigin-RevId: 675635116

… tests Imported from GitHub PR openxla/xla#16938 This PR adds support for NANOO FP8 data format in the collaborative communication unit tests. - For the context on OCP FP8 and NANOO FP8, please refer to this comment: google/flax#3993 (comment) - The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats: openxla/xla#10488 Copybara import of the project: -- 0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>: [AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz. -- d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>: refactor tests for collective comm ops -- 6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>: rafactor collective comm e2e tests -- 8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>: update: replace str -- 338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>: get rid of macros Merging this change closes #16938 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9 PiperOrigin-RevId: 675635116