Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCM] Support TF_ROCM_CLANG for builds with clang host compiler #2192

Merged
merged 4 commits into from
Dec 9, 2023

Conversation

draganmladjenovic
Copy link

Remove TF_NEED_CLANG workarounds

@i-chaochen
Copy link

retest Ubuntu-CPU please

@jayfurmanek
Copy link

We'll also need to update the symlink of rocm.bazelrc to point to gpu.bazelrc (instead of gpu_gcc.bazelrc)

@draganmladjenovic
Copy link
Author

Retest Ubuntu-GPU-single please
Retest Ubuntu-CPU please

@i-chaochen
Copy link

I wonder what's situation for this PR? are we capable of switching to clang now?

@draganmladjenovic
Copy link
Author

Retest Ubuntu-CPU please.

@draganmladjenovic draganmladjenovic force-pushed the clang_build branch 3 times, most recently from 065bd06 to 2a4f1f7 Compare October 28, 2023 08:05
@draganmladjenovic
Copy link
Author

Retest Ubuntu-GPU-multi please.

@draganmladjenovic draganmladjenovic force-pushed the clang_build branch 4 times, most recently from 6300e9a to cbdf070 Compare October 28, 2023 11:17
@draganmladjenovic
Copy link
Author

Retest Ubuntu-CPU please.
Retest Ubuntu-GPU-single please.

@draganmladjenovic
Copy link
Author

@jayfurmanek I think this one is ready. I slightly dislike having to change root .bazelrc in the last commit. Alternative is to create /userlocal/ in Dockerfile.rocm and point bazel in run_gpu_single/multi.sh to use --confing=/userlocal/gpu.bazelrc.

@draganmladjenovic
Copy link
Author

@i-chaochen Where can I update the docker image for ThirdParty-XLA ci?

@i-chaochen
Copy link

ThirdParty-XLA CI is using the same tensorflow docker image (rocm/tensorflow-build:latest-python3.9-rocm5.7.0), may I ask what do you mean you want to update it, and why you want to update it?

@draganmladjenovic
Copy link
Author

ThirdParty-XLA CI is using the same tensorflow docker image (rocm/tensorflow-build:latest-python3.9-rocm5.7.0), may I ask what do you mean you want to update it, and why you want to update it?

It is missing clang-16, so the PR fails, and cannot be merged.

@draganmladjenovic draganmladjenovic force-pushed the clang_build branch 2 times, most recently from be01b2d to 98885a3 Compare November 13, 2023 23:48
@draganmladjenovic
Copy link
Author

ThirdParty-XLA CI is using the same tensorflow docker image (rocm/tensorflow-build:latest-python3.9-rocm5.7.0), may I ask what do you mean you want to update it, and why you want to update it?

It is missing clang-16, so the PR fails, and cannot be merged.

My bad. This CI is not blocking the merge. @jayfurmanek The only thing that is missing is the review. Should I include more people into reviewers?

build:rocm --action_env=CLANG_COMPILER_PATH="/usr/lib/llvm-16/bin/clang"
# Disable unused-result on rocm builds.
build:rocm --copt="-Wno-error=unused-result"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're switching to use gpu.bazelrc, could you also add xla_cpp_filters from gpu_gcc.bazelrc ? Thanks!

@@ -15,6 +15,21 @@ build --action_env=CACHEBUSTER=565341047
# Build options for GPU Linux
build --config=release_gpu_linux

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line pulls in the cuda config from the .bazelrc.
We'll need a "release_rocm_linux" config there I think..

build:rocm --define=tensorflow_mkldnn_contraction_kernel=0
build:rocm --repo_env TF_NEED_ROCM=1
build:rocm --action_env=TF_ROCM_CLANG="1"
build:rocm --repo_env=BAZEL_COMPILER="/usr/lib/llvm-16/bin/clang"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update to clang-17 to match what's upstream now?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will need to be changed in the container as well (merged separately)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update to clang-17 to match what's upstream now?

Thanks. Missed that. Will update.

@@ -15,6 +15,21 @@ build --action_env=CACHEBUSTER=565341047
# Build options for GPU Linux
build --config=release_gpu_linux

# ROCM: Set up compilation ROCM version and paths
build:rocm --linkopt="-fuse-ld=gold"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to use ldd for the linker

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had some issues getting lld on all docker images. I will have to check that.

@draganmladjenovic draganmladjenovic force-pushed the clang_build branch 5 times, most recently from c549c5f to 43db4b8 Compare December 7, 2023 20:43
Test doceker doesn't have /userlocal/gpu.bazelrc so flip the switch
in root .bazelrc.
@jayfurmanek
Copy link

retest gpu-pycpp please

@jayfurmanek
Copy link

retest gpu-non-pip-multi please

@jayfurmanek
Copy link

retest Ubuntu-GPU-multi please

@draganmladjenovic
Copy link
Author

Retest Ubuntu-GPU-single please.

Copy link

@jayfurmanek jayfurmanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Please tag the repo before and after merging:
pre-clang-merge
post-clang-merge

@draganmladjenovic draganmladjenovic merged commit 300e4cb into develop-upstream Dec 9, 2023
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants