Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NANOO FP8 support for collaborative communication unit tests #16938

Closed
wants to merge 6 commits into from

Conversation

ScXfjiang
Copy link
Contributor

@ScXfjiang ScXfjiang commented Sep 9, 2024

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.

@ScXfjiang ScXfjiang marked this pull request as draft September 9, 2024 14:38
@ScXfjiang ScXfjiang marked this pull request as ready for review September 9, 2024 14:52
@NaiyerRizz NaiyerRizz requested review from reedwm and ddunl September 10, 2024 06:37
@NaiyerRizz NaiyerRizz self-assigned this Sep 10, 2024
@wenchenvincent
Copy link
Contributor

@reedwm Could you take a look at this PR?

@@ -54,6 +55,21 @@ DeviceAssignment MakeDeviceAssn(int64_t num_replicas) {

class CollectiveOpsTestE2E : public HloTestBase {
public:
CollectiveOpsTestE2E() {
replacements_[kF8E4M3DatatypePlaceholder] =
#if GOOGLE_CUDA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to avoid using macros like GOOGLE_CUDA and instead check at runtime. Can you check this via stream executor instead, similar to what is done in gemm_rewrite_test?

const auto& device_desc() const {
return backend().default_stream_executor()->GetDeviceDescription();
}
protected:
const se::GpuComputeCapability& Capability() const {
return device_desc().gpu_compute_capability();
}
stream_executor::SemanticVersion GetToolkitVersion() const {
return backend()
.default_stream_executor()
->GetDeviceDescription()
.runtime_version();
}
bool IsCuda() const {
return std::holds_alternative<se::CudaComputeCapability>(Capability());
}
bool IsRocm() const {
return std::holds_alternative<se::RocmComputeCapability>(Capability());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reedwm Hi I have updated it!

@ScXfjiang ScXfjiang requested a review from reedwm September 17, 2024 11:04
copybara-service bot pushed a commit that referenced this pull request Sep 17, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 17, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit that referenced this pull request Sep 17, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit that referenced this pull request Sep 17, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 17, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit that referenced this pull request Sep 17, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 17, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit that referenced this pull request Sep 18, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 18, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit that referenced this pull request Sep 18, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 18, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit that referenced this pull request Sep 18, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 675635116
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 18, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 675635116
@ScXfjiang
Copy link
Contributor Author

Hi @reedwm, this PR hasn't been merged. Could you take a look at it? Many thanks!

copybara-service bot pushed a commit that referenced this pull request Sep 19, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 676515264
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 19, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 676515264
copybara-service bot pushed a commit that referenced this pull request Sep 19, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 676515264
copybara-service bot pushed a commit that referenced this pull request Sep 19, 2024
… tests

Imported from GitHub PR #16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

FUTURE_COPYBARA_INTEGRATE_REVIEW=#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 676515264
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 20, 2024
… tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes #16938

PiperOrigin-RevId: 676615012
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 20, 2024
FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 671073597
copybara-service bot pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 20, 2024
This test verifies whether the API v2 packages can be imported from the
current build. It utilizes the `_api/v2/api_packages.txt` list of packages from
the local wheel file specified in the `requirements_lock_<python_version>.txt`.

The test should be executed after the TF wheel was built and put into `dist` dir inside Tensorflow repository.

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af2ca1a32302fdfe9d7abee335d24539ee9
PiperOrigin-RevId: 673046193
@ScXfjiang ScXfjiang deleted the ci_dev_rccl_nanoo_fp8 branch September 20, 2024 20:13
ScXfjiang added a commit to ROCm/xla that referenced this pull request Sep 20, 2024
…on unit tests

Imported from GitHub PR openxla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes openxla#16938

COPYBARA_INTEGRATE_REVIEW=openxla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 676615012
ScXfjiang added a commit to ROCm/xla that referenced this pull request Sep 20, 2024
…on unit tests

Imported from GitHub PR openxla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes openxla#16938

COPYBARA_INTEGRATE_REVIEW=openxla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 676615012
ScXfjiang added a commit to ROCm/xla that referenced this pull request Sep 20, 2024
…on unit tests

Imported from GitHub PR openxla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla#10488
Copybara import of the project:

--
0fc74cc by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418 by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ec by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes openxla#16938

COPYBARA_INTEGRATE_REVIEW=openxla#16938 from ROCm:ci_dev_rccl_nanoo_fp8 338d3af
PiperOrigin-RevId: 676615012
ScXfjiang added a commit to ROCm/tensorflow-upstream that referenced this pull request Sep 20, 2024
…ation unit tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes tensorflow#16938

PiperOrigin-RevId: 676615012
ScXfjiang added a commit to ROCm/tensorflow-upstream that referenced this pull request Sep 20, 2024
…ation unit tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes tensorflow#16938

PiperOrigin-RevId: 676615012
ScXfjiang added a commit to ROCm/tensorflow-upstream that referenced this pull request Sep 20, 2024
…ation unit tests

Imported from GitHub PR openxla/xla#16938

This PR adds support for NANOO FP8 data format in the collaborative communication unit tests.
- For the context on OCP FP8 and NANOO FP8, please refer to this comment:
google/flax#3993 (comment)
- The unit tests in this PR are similar to GEMM unit test introduced in the following PR to be able to deal with both OCP and NANOO fp8 formats:
openxla/xla#10488
Copybara import of the project:

--
0fc74ccae6cfcaf4e8627ea338ee03783af0626b by Wen Chen <Wen.Chen@amd.com>:

[AMD] Added NCCL support for fp8e4m3fnuz and fp8e5m2fnuz.

--
d247af5cd33fe42698bb55ef1c18f32df8a02a21 by scxfjiang <sc.xfjiang@gmail.com>:

refactor tests for collective comm ops

--
6f8c418b3052f7c531896bd5f8cbbc7a766ef7fc by scxfjiang <sc.xfjiang@gmail.com>:

rafactor collective comm e2e tests

--
8ecb6ecf08a1536c5b3f8ba87e0e9f8813b1b359 by scxfjiang <sc.xfjiang@gmail.com>:

update: replace str

--
338d3af2ca1a32302fdfe9d7abee335d24539ee9 by scxfjiang <sc.xfjiang@gmail.com>:

get rid of macros

Merging this change closes tensorflow#16938

PiperOrigin-RevId: 676615012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants