-
Notifications
You must be signed in to change notification settings - Fork 60
Consider ROCm releases with upstream llvm compatibility #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I was speaking with some of the compiler guys and I believe the intention is to move towards using stable LLVM or at least allow for compilation against stable LLVM. Unfortunately, there's no timeline and it seems like a bit of an involved task due to how their code flows processes work. For now, I've been just maintaining my own branch of compiler support: It's mostly reverts or cherry-picks to get it to work. It's not perfect, but it gets the Fedora packages moving. Some downstream ROCm components might expect a newer LLVM, but this component is definitely the most sensitive. I suspect when I eventually get around to packaging HIP in Fedora, the package set will be more sensitive to the LLVM version. |
Hope stable releases of downstream projects could be synced with LLVM. It would be better if they got upstream (especially rocm-device-libs)
From my experience rocm-device-libs and rocm-compilersupport has the closest relationship with LLVM. The first one provides bitcodes, and the second currently contains comgr, in my sense a library that calls clang driver (using C++ API) to compile and inspect GPU code objects. HIP is a runtime API + hipcc the perl wrapper (for AOT) + hiprtc (the wrapper for comgr? For JIT), and it's relationship with llvm/clang is somehow weaker compared to the previous two. |
llvm#60313 & https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45 4 of comgr test case use a feature to call amdgpu kernel within a amdgpu kernel, but the feature is not well implemented and not upstreamed. Last time a check fedora has patched out these 4 tests. In Gentoo I patched the test to avoid calling other kernels inside a kernel, so the compilations are OK with upstream clang. This strips out the test failures, and keeps the purpose of testing basic compilation capabilities of comgr. |
And in developing branch a potential breakage is found between llvm/clang and rocm-device-libs: https://reviews.llvm.org/D142507 & ROCm/ROCm-Device-Libs@8dc779e |
@searlmc1 as FYI
Yes that was me, and I just adopted your patch. Thank you!
Thanks for pointing this out. I guess it means we should keep device libs at 5.4 for LLVM 16? I'm a bit uncomfortable with this response:
|
And in amd-stg-open branch they added
Yes. While there's uncertainties. If they (ROCm) may revert ROCm/ROCm-Device-Libs@8dc779e on ROCm-5.5.x releases then things will be fine. Or we have to revert it ourselves for llvm-16 compatibility. |
Ok after some thought, for the meantime I think I'm going to try to maintain some branches for LLVM versions: https://github.com/Mystro256/ROCm-Device-Libs/tree/release/16.x Feel free to contribute if you are interested. I'm going to try evaluating it when Fedora bumps to llvm 16. |
Haven't cut a release yet, but for the HIP portion of ROCm I now have Bazel build files in rules_ll. This project makes it possible to write a target like: ll_binary(name="hip_example", srcs=["hello.hip"], compilation_mode="hip_amdgpu") Which can be built and run with:
With a single command invocation this all happens automaticlly 🥳 This is of course highly experimental. at the moment it only works with out-of-tree rules_ll, and not with the regular installation instructions. I'll create a release in the next few days and then create an issue that explains this in more detail. |
@littlewu2508 Is this ticket still relevant? Thanks! |
Hi. The general issue is still relevant. For example, ROCm 6.3 release notes say that it uses LLVM 18 but when trying to use it with upstream LLVM 18 it breakes because of LLVM 19 specific intrinsics. I'm using Julia and AMDGPU.jl package for AMD GPU programming and following simple MWE: function ker!(x)
x[1] = AMDGPU.Device.sync_workgroup_or(Cint(0))
return
end triggers: ERROR: LoadError: InvalidIRError: compiling MethodInstance for ker!(::AMDGPU.Device.ROCDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to llvm.amdgcn.permlanex16.i32) where The The devlib being tied to LLVM version in general creates a bit of problems for Julia since each release is based on a specific LLVM version. So far we've been patching/recompiling them with correct LLVM version and shipping our own libraries, but I wish the process could be improved. |
Planning to kick off some internal discussion to see if there's anything we can do to improve the situation for this. |
One of major issues for ROCm distribution packaging is the compatibility with upstream llvm due to the asynchronous development cycle.
For example, feature A pushed to ROCm's llvm, and then pushed to upstream llvm. If unfortunate, A needs to wait about half a year to enter next llvm stable release; while for ROCm the next release will come sooner, shipping A and components depending on A.
What's more, A is implemented upon llvm version X, but when upstreamed it enters the X+1 release. However the ROCm new release Y ships A with llvm X (ROCm forked). Distribution: cannot ship llvm-X & ROCm-Y becuase ROCm-Y depend on A. llvm-X+1 is also incompatible with ROCm-Y because it is based on llvm-X.
Situation nowadays are much better, only few patches can do the hack. But it still makes maintaining ROCm distribution packages difficult. It would be better if ROCm latest source releases can be compatible with the upstream llvm release.
@emollier @cgmb @Mystro256 @Madouura @tpkessler Distribution packagers can also reference this issue when reporting and fixing such incompatibility, and sharing information to reduce workload.
The text was updated successfully, but these errors were encountered: