-
Notifications
You must be signed in to change notification settings - Fork 798
[SYCL][ROCm] Use offload-arch instead of mcpu for AMD arch #4239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@malixian This patch should fix the issue discussed in: |
AGindinson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe clang/test/Driver/sycl-offload-amdgcn.cpp should be expanded, at least to check the error message and enforce valid command lines for the main test cases.
elizabethandrews
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for the new diagnostic
5a790a2 to
a634bd9
Compare
This patch changes using `-mcpu` for SYCL applications targeting AMD to `-Xsycl-target-backend --offload-arch`. Before this patch the offloading arch wasn't set correctly for AMD architectures. This is fixing an issue with HIP that was talked about in intel#4133, regarding having `v4` in the hip part of the triple, without the `v4` HIP seems to be ignoring the fact that the offloading arch is missing from the triple, which is why there was a workaround orignally to force not using `v4` with SYCL. By fixing the offloading arch this patch fixes the issue properly and now the triple with `v4` works because it also contains the offloading architecture.
Co-authored-by: Artem Gindinson <artem.gindinson@intel.com>
|
I've updated I'm still seeing one test failing with this when running |
a634bd9 to
7331029
Compare
| if (Triple.isAMDGCN() && llvm::none_of(GpuArchList, [&](auto &P) { | ||
| return P.first.isAMDGCN(); | ||
| })) { | ||
| C.getDriver().Diag(clang::diag::err_drv_sycl_missing_amdgpu_arch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a default AMD GPU arch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I didn't go for the default AMD GPU arch is that as I understand it we need to specify the exact GPU architecture for AMD so a default would only work for a very specific type of GPUs. Which means that in a lot of cases users would still need to specify the architecture manually, so I think it is better to force the architecture to always be set manually and have a clear diagnostic, than have a default architecture that rarely works and a more confusing error message from hip.
This is different with NVidia because SM_50 covers a lot of different GPUs, so in most cases it will work out of the box and the user won't have to set the architecture.
Co-authored-by: Victor Lomuller <victor@codeplay.com>
Naghasan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
elizabethandrews
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@AGindinson, @hchilama, @mdtoguchi, ping. |
AaronBallman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
hchilama
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…4463) #4175 introduced automatic addition of the generic spir64 device target when any section of the input objects had this triple assigned to it. As a result, the actual list of toolchains started exceeding the user-provided one by 1 item. After #4239, the above became a problem. The dispatch of -Xsycl-target-* arguments started happening earlier in theflow, which broke the following use-case: ``` clang++ -fsycl -fsycl-targets=spir64_gen gen-obj.o gen-and-spir64-obj.o -Xsycl-target-backend "-device *" ``` A fix for now is to ignore the autodetected spir64 target when propagating the -Xsycl-target-backend arguments. A permanent solution would involve a re-design of -Xsycl-target-backend handling so that it took place only once in the flow, or belating the addition of the autodetected generic triple into the list of device targets. Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
This patch changes using
-mcpufor SYCL applications targeting AMD to-Xsycl-target-backend --offload-arch.Before this patch the offloading arch wasn't set correctly for AMD
architectures.
This is fixing an issue with HIP that was talked about in #4133,
regarding having
v4in the hip part of the triple, without thev4HIP seems to be ignoring the fact that the offloading arch is missing
from the triple, which is why there was a workaround orignally to force
not using
v4with SYCL. By fixing the offloading arch this patch fixesthe issue properly and now the triple with
v4works because it alsocontains the offloading architecture.