Consider ROCm releases with upstream llvm compatibility #263

littlewu2508 · 2023-02-10T15:09:11Z

One of major issues for ROCm distribution packaging is the compatibility with upstream llvm due to the asynchronous development cycle.

For example, feature A pushed to ROCm's llvm, and then pushed to upstream llvm. If unfortunate, A needs to wait about half a year to enter next llvm stable release; while for ROCm the next release will come sooner, shipping A and components depending on A.

What's more, A is implemented upon llvm version X, but when upstreamed it enters the X+1 release. However the ROCm new release Y ships A with llvm X (ROCm forked). Distribution: cannot ship llvm-X & ROCm-Y becuase ROCm-Y depend on A. llvm-X+1 is also incompatible with ROCm-Y because it is based on llvm-X.

Situation nowadays are much better, only few patches can do the hack. But it still makes maintaining ROCm distribution packages difficult. It would be better if ROCm latest source releases can be compatible with the upstream llvm release.

@emollier @cgmb @Mystro256 @Madouura @tpkessler Distribution packagers can also reference this issue when reporting and fixing such incompatibility, and sharing information to reduce workload.

Mystro256 · 2023-02-26T05:39:37Z

I was speaking with some of the compiler guys and I believe the intention is to move towards using stable LLVM or at least allow for compilation against stable LLVM. Unfortunately, there's no timeline and it seems like a bit of an involved task due to how their code flows processes work.

For now, I've been just maintaining my own branch of compiler support:
https://github.com/Mystro256/ROCm-CompilerSupport/tree/rocm-4.5.2-llvm13
https://github.com/Mystro256/ROCm-CompilerSupport/tree/rocm-roc-5.0.x-llvm13
https://github.com/Mystro256/ROCm-CompilerSupport/tree/roc-5.1.x-llvm14
https://github.com/Mystro256/ROCm-CompilerSupport/tree/rocm-5.2.x-llvm14
https://github.com/Mystro256/ROCm-CompilerSupport/tree/rocm-5.2.x-llvm15
(I believe ROCm 5.3 and 5.4 just worked as-is with LLVM 15 and 16 respectively)

It's mostly reverts or cherry-picks to get it to work. It's not perfect, but it gets the Fedora packages moving.

Some downstream ROCm components might expect a newer LLVM, but this component is definitely the most sensitive. I suspect when I eventually get around to packaging HIP in Fedora, the package set will be more sensitive to the LLVM version.

littlewu2508 · 2023-02-26T06:22:33Z

I was speaking with some of the compiler guys and I believe the intention is to move towards using stable LLVM or at least allow for compilation against stable LLVM. Unfortunately, there's no timeline and it seems like a bit of an involved task due to how their code flows processes work.

Hope stable releases of downstream projects could be synced with LLVM. It would be better if they got upstream (especially rocm-device-libs)

Some downstream ROCm components might expect a newer LLVM, but this component is definitely the most sensitive. I suspect when I eventually get around to packaging HIP in Fedora, the package set will be more sensitive to the LLVM version.

From my experience rocm-device-libs and rocm-compilersupport has the closest relationship with LLVM. The first one provides bitcodes, and the second currently contains comgr, in my sense a library that calls clang driver (using C++ API) to compile and inspect GPU code objects. HIP is a runtime API + hipcc the perl wrapper (for AOT) + hiprtc (the wrapper for comgr? For JIT), and it's relationship with llvm/clang is somehow weaker compared to the previous two.

littlewu2508 · 2023-02-26T06:34:52Z

llvm#60313 & https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45

4 of comgr test case use a feature to call amdgpu kernel within a amdgpu kernel, but the feature is not well implemented and not upstreamed. Last time a check fedora has patched out these 4 tests.

In Gentoo I patched the test to avoid calling other kernels inside a kernel, so the compilations are OK with upstream clang. This strips out the test failures, and keeps the purpose of testing basic compilation capabilities of comgr.

littlewu2508 · 2023-02-26T06:41:03Z

And in developing branch a potential breakage is found between llvm/clang and rocm-device-libs:

https://reviews.llvm.org/D142507 & ROCm/ROCm-Device-Libs@8dc779e

Mystro256 · 2023-02-28T04:33:44Z

@searlmc1 as FYI

4 of comgr test case use a feature to call amdgpu kernel within a amdgpu kernel, but the feature is not well implemented and not upstreamed. Last time a check fedora has patched out these 4 tests.

Yes that was me, and I just adopted your patch. Thank you!
See:
https://src.fedoraproject.org/rpms/rocm-compilersupport/c/cd84e9a05e34cedd76a1e9e8eeb586cf7bf50791?branch=rawhide

And in developing branch a potential breakage is found between llvm/clang and rocm-device-libs:

Thanks for pointing this out. I guess it means we should keep device libs at 5.4 for LLVM 16? I'm a bit uncomfortable with this response:

[from https://reviews.llvm.org/D142507] In general the idea is that compiler and device-libs should match. I guess the correct answer then users of clang-16 shall use rocm-5.4.x branch of the device libs?

littlewu2508 · 2023-02-28T12:30:09Z

See: https://src.fedoraproject.org/rpms/rocm-compilersupport/c/cd84e9a05e34cedd76a1e9e8eeb586cf7bf50791?branch=rawhide

And in amd-stg-open branch they added mangled_names test, which needs to be adjusted after the workaround for amdgpu kernel call:
https://github.com/littlewu2508/gentoo/blob/rocm-9999/dev-libs/rocm-comgr/files/rocm-comgr-9999-fix-tests.patch

Thanks for pointing this out. I guess it means we should keep device libs at 5.4 for LLVM 16? I'm a bit uncomfortable with this response:

Yes. While there's uncertainties. If they (ROCm) may revert ROCm/ROCm-Device-Libs@8dc779e on ROCm-5.5.x releases then things will be fine. Or we have to revert it ourselves for llvm-16 compatibility.

Mystro256 · 2023-02-28T15:37:15Z

Ok after some thought, for the meantime I think I'm going to try to maintain some branches for LLVM versions:

https://github.com/Mystro256/ROCm-Device-Libs/tree/release/16.x
https://github.com/Mystro256/ROCm-CompilerSupport/tree/release/16.x

Feel free to contribute if you are interested. I'm going to try evaluating it when Fedora bumps to llvm 16.
I'm going to skim through the patches as they come in release branches and pull in the critical fixes as I go. For now it's just a snapshot of amd-stg-open.

aaronmondal · 2023-03-11T10:54:02Z

Haven't cut a release yet, but for the HIP portion of ROCm I now have Bazel build files in rules_ll.

This project makes it possible to write a target like:

ll_binary(name="hip_example", srcs=["hello.hip"], compilation_mode="hip_amdgpu")

Which can be built and run with:

bazel run hip_example

rules_ll will fetch LLVM and ROCm from close to upstream, build a C++ toolchain from the fetched LLVM, use that toolchain to build the ROCm stack and then construct a hip_amdgpu toolchain that builds the hip_example target. (and then Bazel runs it). This all happens in wrapped Nix environments, making this build fully hermetic and reproducible down to the c library (modulo bugs).

With a single command invocation this all happens automaticlly 🥳

This is of course highly experimental. at the moment it only works with out-of-tree rules_ll, and not with the regular installation instructions. I'll create a release in the next few days and then create an issue that explains this in more detail.

ppanchad-amd · 2024-05-27T18:19:37Z

@littlewu2508 Is this ticket still relevant? Thanks!

pxl-th · 2025-04-09T22:36:36Z

Hi. The general issue is still relevant.

For example, ROCm 6.3 release notes say that it uses LLVM 18 but when trying to use it with upstream LLVM 18 it breakes because of LLVM 19 specific intrinsics.

I'm using Julia and AMDGPU.jl package for AMD GPU programming and following simple MWE:

function ker!(x)
    x[1] = AMDGPU.Device.sync_workgroup_or(Cint(0))
    return
end

triggers:

ERROR: LoadError: InvalidIRError: compiling MethodInstance for ker!(::AMDGPU.Device.ROCDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to llvm.amdgcn.permlanex16.i32)

where sync_workgroup_or is just a call to __ockl_wgred_or_i32.

The llvm.amdgcn.permlanex16.i32 was added in LLVM 19.

The devlib being tied to LLVM version in general creates a bit of problems for Julia since each release is based on a specific LLVM version. So far we've been patching/recompiling them with correct LLVM version and shipping our own libraries, but I wish the process could be improved.

lamb-j · 2025-04-10T00:05:47Z

Planning to kick off some internal discussion to see if there's anything we can do to improve the situation for this.

MathiasMagnus mentioned this issue Apr 7, 2023

LLVM ERROR: Unsupported calling convention for call for llvm/clang 14.0.5 #35

Closed

lamb-j self-assigned this May 28, 2024

lamb-j transferred this issue from ROCm/ROCm Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider ROCm releases with upstream llvm compatibility #263

Consider ROCm releases with upstream llvm compatibility #263

littlewu2508 commented Feb 10, 2023

Mystro256 commented Feb 26, 2023

littlewu2508 commented Feb 26, 2023

littlewu2508 commented Feb 26, 2023

littlewu2508 commented Feb 26, 2023

Mystro256 commented Feb 28, 2023

littlewu2508 commented Feb 28, 2023 •

edited

Loading

Mystro256 commented Feb 28, 2023 •

edited

Loading

aaronmondal commented Mar 11, 2023

ppanchad-amd commented May 27, 2024

pxl-th commented Apr 9, 2025

lamb-j commented Apr 10, 2025

Consider ROCm releases with upstream llvm compatibility #263

Consider ROCm releases with upstream llvm compatibility #263

Comments

littlewu2508 commented Feb 10, 2023

Mystro256 commented Feb 26, 2023

littlewu2508 commented Feb 26, 2023

littlewu2508 commented Feb 26, 2023

littlewu2508 commented Feb 26, 2023

Mystro256 commented Feb 28, 2023

littlewu2508 commented Feb 28, 2023 • edited Loading

Mystro256 commented Feb 28, 2023 • edited Loading

aaronmondal commented Mar 11, 2023

ppanchad-amd commented May 27, 2024

pxl-th commented Apr 9, 2025

lamb-j commented Apr 10, 2025

littlewu2508 commented Feb 28, 2023 •

edited

Loading

Mystro256 commented Feb 28, 2023 •

edited

Loading