-
-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZLUDA: init at 3 #288644
ZLUDA: init at 3 #288644
Conversation
So problem currently is that while while compiling with the same buildinputs works when just running the
Not sure how to fix the issue so help is appreciated. |
Yup, just ran out of disk space for couple days which prevented me from building things 😅. Anyways, getting closer. With the latest changes the llvm part actually seems to compile fine, but errors later in compilation with the following:
zluda/build.rs is the following (https://github.com/vosen/ZLUDA/blob/master/zluda/build.rs):
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/help-with-packaging-complex-rust-library/40197/1 |
this now gets me:
|
Thanks for verifying that it's at least not just me. This seems to match what I'm getting with the current branch #288644 (comment) (The comment after that was just local testing while trying to bypass this error) Any ideas? |
i feel a bit over my head there 🙈, i've yet to really get into rust |
It seems like this is a issue with |
Did some testing during weekend, using So @ulrikstrid if you have time to test that idea properly I'd appreciate :) |
I got it to build: pass.patch.txt, mostly by looking for another package that deals with vergen: it's completely sidestepping it.
Now, I don't think the resulting build output will work as is:
So, more work remaining. |
@jcaesar great! I've added your changes and indeed the build finishes now. Not resulting in binaries is not an issue as ZLUDA on Linux is a library and you just need to add it to your LD_LIBRARY_PATH. e.g. you'd basically do That said, the result is not yet what's expected. Currently:
and when trying to run it:
while manually building it outside nixpkgs resulted in:
and running it:
|
Though those should be all .so files and missing ones are just symlinks (apart from .d files but I'd assume those aren't necessary). I'll check if not having the symlinks is the only remaining issue EDIT: yup, that's it. After creating those symlinks it works:
So remaining question is: what's causing the symlinks not to be created automatically. I'd prefer not adding those manually in nix package if possible. |
Added version with the symlinks manually created, this is the first version that actually results in a working package! |
pkgs/by-name/zl/zluda/package.nix
Outdated
# Comment out zluda_blaslt in Cargo.toml | ||
sed -i '/zluda_blaslt/d' Cargo.toml | ||
# TODO: investigate test failure (the test seems to require build time env vars that aren't set on linux?) | ||
rm zluda_inject/tests/inject.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reference to a more detailed documentation of the errors is desirable. A gist with the build logs or, better yet, a github issue. A very short textual description of the issue so people can't get the idea without opening the link.
E.g.
- "seems to require build time env vars that aren't set on linux" -> "[seems to ]require(s) variables X and Y during ZZZZ"
- "Comment out zluda_blaslt in Cargo.toml" -> "zluda_blaslt used upstream for XXXX, disabled because YYYY (link to the issue)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For inject:
disable test written for windows only: https://github.com/vosen/ZLUDA/blob/774f4bcb37c39f876caf80ae0d39420fa4bc1c8b/zluda_inject/tests/inject.rs#L55
? (Though I see quite a few packages that just setdoCheck = false;
with no explanation or comments liketests fail
.) - For blaslt: Sorry, already forgot, you'll have to reinvestigate this @errnoh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the above note for the inject.rs
, starting a separate conversation to address the zluda_blaslt
part
Result of 1 package failed to build:
EDIT: Ignore this, it was a timeout |
Resolved most of the remaining conversations in the latest commit. Thought also that it's probably good idea to mention that the |
Main conversation still remaining is the commenting out of When left uncommented the build results in: Compiling hip_common v0.0.0 (/build/ZLUDA/hip_common)
Compiling zluda_dark_api v0.0.0 (/build/ZLUDA/zluda_dark_api)
Compiling comgr v0.0.0 (/build/ZLUDA/comgr)
Compiling zluda_blaslt v0.0.0 (/build/ZLUDA/zluda_blaslt)
Compiling zluda_sparse v0.0.0 (/build/ZLUDA/zluda_sparse)
Compiling zluda_blas v0.0.0 (/build/ZLUDA/zluda_blas)
Compiling zluda_fft v0.0.0 (/build/ZLUDA/zluda_fft)
warning: crate `cublasLt` should have a snake case name
|
= help: convert the identifier to snake case: `cublas_lt`
= note: `#[warn(non_snake_case)]` on by default
error: linking with `/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin/cc` failed: exit status: 1
|
= note: LC_ALL="C" PATH="/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/bin:/nix/store/y027d3bvlaizbri04c1bzh28hqd6lj01-python3-3.11.7/bin:/nix/store/r7a1rz942f5yvbknm262sg369kwbv7b7-cargo-1.75.0/bin:/nix/store/qjv64w8q2higlmsa5wl9dxnizvqplkrp-rustc-wrapper-1.75.0/bin:/nix/store/hkhmxs4n1agpdpyamlh2b78pm9wch0br-cmake-3.27.9/bin:/nix/store/lnl2zcfs4gd0cj2mpc7744s63babv37g-clang-wrapper-16.0.6/bin:/nix/store/s0rk29zc6n3x6xmpb39rypac36k2gpbj-clang-16.0.6/bin:/nix/store/36wymklsa60bigdhb0p3139ws02r46lw-glibc-2.38-44-bin/bin:/nix/store/bicmg5gd50q6igk0y5mga1v0p1lk8f26-coreutils-9.4/bin:/nix/store/3avks95g4s9rij1s47ldzh7h93m43lss-binutils-wrapper-2.40/bin:/nix/store/2ab5740x0cy1d74qvbpl5s28qikmppl5-binutils-2.40/bin:/nix/store/4sf3mmnawkgjyyyzqz5nn8wm0gdvp0wa-auditable-cargo-1.75.0/bin:/nix/store/v3b4la4kh5l7dqzdyraqb1lyfrajfl5w-patchelf-0.15.0/bin:/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin:/nix/store/qs1nwzbp2ml3cxzsxihn82hl0w73snr0-gcc-13.2.0/bin:/nix/store/c53f8hagyblvx52zylsnqcc0b3nxbrcl-binutils-wrapper-2.40/bin:/nix/store/bicmg5gd50q6igk0y5mga1v0p1lk8f26-coreutils-9.4/bin:/nix/store/p6fd7piqrin2h0mqxzmvyxyr6pyivndj-findutils-4.9.0/bin:/nix/store/2d582qba31ii28nyrww9bzb00aq06d1g-diffutils-3.10/bin:/nix/store/vd92lhcxs39hbdnzj8ycak5wvj466s3l-gnused-4.9/bin:/nix/store/mn911d51n5lklwr3zy4mdhxa77wzancb-gnugrep-3.11/bin:/nix/store/h53ycc406fmbq3ff0n0rjxdzb6lk9zcn-gawk-5.2.2/bin:/nix/store/1ds6c0i7z4advdr0z210sxgvmq786h09-gnutar-1.35/bin:/nix/store/nf4fhdqgjka360nkibx1yg14gybwb018-gzip-1.13/bin:/nix/store/v3hp6kidlb9yz6j51a0wlbnpclqpi94f-bzip2-1.0.8-bin/bin:/nix/store/15xrks0frcgils8qxfkhspyg6gi9rxdh-gnumake-4.4.1/bin:/nix/store/5l50g7kzj7v0rdhshld1vx46rf2k5lf9-bash-5.2p26/bin:/nix/store/2pi9hb31np2vhy8r9lfih47rf9n51crz-patch-2.7.6/bin:/nix/store/h8vfiwhq6kmvrnj96w52n36c6qm4lbyl-xz-5.4.6-bin/bin:/nix/store/rn6yfzxwp12z0zqavxx1841mh0ypr7jg-file-5.45/bin" VSLANG="1033" "/nix/store/4cjqvbp1jbkps185wl8qnbjpf8bdy8j9-gcc-wrapper-13.2.0/bin/cc" "-Wl,--version-script=/build/rustcahiD0a/list" "-Wl,--no-undefined-version" "-m64" "/build/rustcahiD0a/symbols.o" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/cublasLt.cublasLt.aee62bd5d56fa81a-cgu.0.rcgu.o" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/cublasLt.47gcc4jonrh11t15.rcgu.o" "-Wl,--as-needed" "-L" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps" "-L" "/build/ZLUDA/target/release/deps" "-L" "/opt/rocm/lib/" "-L" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/build/libsqlite3-sys-6f6f9cf7ba865a7f/out" "-L" "/opt/rocm/lib/" "-L" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/build/lz4-sys-cbf01ed93ef0cfde/out" "-L" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-Wl,-Bstatic" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libzluda_dark_api-309235fa31e9aab1.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libthread_id-581b343fe573f8b0.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblz4_sys-ca4d797647e2a031.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblibc-8fdb7fcb9a4819e0.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libbitflags-5430a0b02c754ead.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libbit_vec-3e4c5620a30ac5e5.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhip_common-f1ec58a835dcc419.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblibloading-a276f34f50629942.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libmemchr-5c4a7f1ebb9fd8aa.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/librustc_hash-dc1a96583e6f3d89.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcapnp-3dfebef2d6d75177.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libgoblin-92470a2530cf50d1.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libplain-094ec0584b4e1528.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblog-ccd73e91eb01d483.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libscroll-bd3a3726a47e373d.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/librusqlite-10164dcf0927e589.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libbitflags-9dda00baa75ae047.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libsmallvec-b0b884bc679f621a.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libserde_json-45d32efa3cf75a1d.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libitoa-118cb57f35bbd8fc.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libryu-462642de0f633fe4.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libserde-adf6c4e5a8285439.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libfallible_streaming_iterator-4cc98b876b55a4ce.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libfallible_iterator-68abc77b7f56c66f.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhashlink-396894a1c08b212c.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhashbrown-4d90f8b654fad04f.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libahash-2c0cabd2d851f5a7.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libonce_cell-76d12902c72ad724.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcfg_if-3c02971f6388fdf8.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libzerocopy-ab20fc1c86e5a482.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liballocator_api2-4d43a7f0bdc365e3.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/liblibsqlite3_sys-2b4f0e9c94c331a6.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libitertools-e6ec8afd82417cdf.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libeither-d65617cfa6ca3fd6.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhip_runtime_sys-3f6b8fa5c8000e15.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcuda_types-f36bb99fa8b90530.rlib" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libhipblaslt_sys-a74f769aef006537.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-bf2160fadd66da13.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libpanic_unwind-56ffc7344a3fa9ec.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libobject-45b585256bbdad6f.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libmemchr-4ff2a73349a27351.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libaddr2line-13b90ddabcfcdff7.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libgimli-d24d3e21c4b6d183.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_demangle-09fdf503f250d6a7.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd_detect-4c6d792e86d76f74.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libhashbrown-67ad1ad36ffec836.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_alloc-ab6e54dadf25bfd1.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libminiz_oxide-0f5ce74ab0128e5b.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libadler-c41e85cdd0e0dfff.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libunwind-b9b23c28f438e60f.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcfg_if-6771b1ae9cbfc2c9.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/liblibc-84992ee57e15ddc0.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/liballoc-bc9174b398261284.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc_std_workspace_core-ce683bfa6346b7ff.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcore-baa5449bbf3e5ae7.rlib" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib/libcompiler_builtins-e8cdafc9faa29ecd.rlib" "-Wl,-Bdynamic" "-ldl" "-lamdhip64" "-lhipblaslt" "-lgcc_s" "-lutil" "-lrt" "-lpthread" "-lm" "-ldl" "-lc" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-L" "/nix/store/8x1k9hmn5mgzaad34p1i37ii7n0l16yc-rustc-1.75.0/lib/rustlib/x86_64-unknown-linux-gnu/lib" "-o" "/build/ZLUDA/target/x86_64-unknown-linux-gnu/release/deps/libcublasLt.so" "-Wl,--gc-sections" "-shared" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-nodefaultlibs"
= note: /nix/store/2ab5740x0cy1d74qvbpl5s28qikmppl5-binutils-2.40/bin/ld: cannot find -lhipblaslt: No such file or directory
collect2: error: ld returned 1 exit status this library likely is generated by |
Ah, I see request for hipblaslt added in #197885. Should we aim to release this without |
If some non-trivial part of zluda is usable without hipblaslt we should release it. We can update update the comment with a link to the issue(s) |
Modified comment to point at the issue, or specifically your comment there. It's a bit weird one to follow as it's a long-running issue that's not closed when the package is implemented, but it's still closest there is so should be ok. Based on the earlier message by @deftdawg this does seem to already provide enough functionality to work in Blender for example. Sounds like value to me :) Main risk I can see is that some users might be confused or make wrong assumptions about zluda due to the blaslt not being there, but as you said it's likely better to release as it provides value already. |
Can't afford to test locally rn because of #301937 and I suspect Ofborg would fail as well. @errnoh can you confirm zluda is in a buildable state rn? Also please update the commit messages ("working nixpkgs build" etc) as per the manual |
Builds fine on the branch I'm developing it, but before answering you I tried with current nixpkgs head and the build that takes 1-2 minutes is now taking hours with 24 cores at 100% so it's hard to say definitely that it's working there. Is this related to #301937 and wth is going on? edit: looking at the log it's trying to compile [errnoh@desk:~/dev/nixpkgs]$ nix-build -A zluda
these 3 derivations will be built:
/nix/store/ycw27kbfiymaxn661ss60xzfw6p59m2s-composable_kernel-6.0.2.drv
/nix/store/wxxvxzz0rhwh1rzf4xrfpn3ghh485pgx-miopen-6.0.2.drv
/nix/store/jhxswf95dy2k6l7npj6fq95q6sy9yaz2-zluda-3.drv
building '/nix/store/ycw27kbfiymaxn661ss60xzfw6p59m2s-composable_kernel-6.0.2.drv'...
Running phase: unpackPhase
unpacking source archive /nix/store/awil5vbp1dhj6if3qf4iac8yw7jxkv1m-source
source root is source
Running phase: patchPhase
substituteStream(): WARNING: '--replace' is deprecated, use --replace-{fail,warn,quiet}. (file 'CMakeLists.txt')
Running phase: updateAutotoolsGnuConfigScriptsPhase
Running phase: configurePhase
fixing cmake files...
cmake flags: -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DCMAKE_INSTALL_LOCALEDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/doc/composable_kernel -DCMAKE_INSTALL_INFODIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/include -DCMAKE_INSTALL_SBINDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/sbin -DCMAKE_INSTALL_BINDIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/bin -DCMAKE_INSTALL_NAME_DIR=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2/lib -DCMAKE_POLICY_DEFAULT_CMP0025=NEW -DCMAKE_OSX_SYSROOT= -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_STRIP=/nix/store/m4d89r3829qzkxr98nzkfdfcs0z4h3cw-rocm-llvm-binutils-6.0.2/bin/strip -DCMAKE_RANLIB=/nix/store/yb0flwbqcwh66b80lyx0ifmxybqzklaj-rocm-llvm-clang-wrapper-6.0.2/bin/ranlib -DCMAKE_AR=/nix/store/yb0flwbqcwh66b80lyx0ifmxybqzklaj-rocm-llvm-clang-wrapper-6.0.2/bin/ar -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_INSTALL_PREFIX=/nix/store/0dxlnnlkcyqjwp38bs88xvf4gf5rk4j0-composable_kernel-6.0.2 -DCMAKE_C_COMPILER=hipcc -DCMAKE_CXX_COMPILER=hipcc
-- The C compiler identification is Clang 17.0.0
-- The CXX compiler identification is Clang 17.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /nix/store/z5lnja1kc4l9cwn99dm1jgghvgsxj0y4-clr-6.0.2/bin/hipcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /nix/store/z5lnja1kc4l9cwn99dm1jgghvgsxj0y4-clr-6.0.2/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /nix/store/y56vmnczakd9p0dsjl6jgnqrkqv04yxx-git-2.44.0/bin/git (found version "2.44.0")
fatal: not a git repository (or any of the parent directories): .git
GPU_TARGETS=
checking which targets are supported
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940
-- Performing Test COMPILER_HAS_TARGET_ID_gfx940 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941
-- Performing Test COMPILER_HAS_TARGET_ID_gfx941 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1030 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1102 - Success
Supported GPU_TARGETS= gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102
Building CK for the following targets: gfx908;gfx90a;gfx940;gfx941;gfx942;gfx1030;gfx1100;gfx1101;gfx1102
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
hip_version_flat=600000000
Adding the fno-offload-uniform-block compiler flag
CMAKE_CXX_COMPILER_ID: Clang
OpenMP_CXX_LIB_NAMES: libomp;libgomp;libiomp5
OpenMP_gomp_LIBRARY:
OpenMP_pthread_LIBRARY:
OpenMP_CXX_FLAGS: -fopenmp=libomp -Wno-unused-command-line-argument
-- Build with HIP 6.0.0
-- Clang tidy found: 17.0.0git
-- Clang tidy checks: *,-abseil-*,-android-cloexec-fopen,-cert-msc30-c,-bugprone-exception-escape,-bugprone-macro-parentheses,-cert-env33-c,-cert-msc32-c,-cert-msc50-cpp,-cert-msc51-cpp,-cert-dcl37-c,-cert-dcl51-cpp,-clang-analyzer-alpha.core.CastToStruct,-clang-analyzer-optin.performance.Padding,-clang-diagnostic-deprecated-declarations,-clang-diagnostic-extern-c-compat,-clang-diagnostic-unused-command-line-argument,-cppcoreguidelines-avoid-c-arrays,-cppcoreguidelines-avoid-magic-numbers,-cppcoreguidelines-explicit-virtual-functions,-cppcoreguidelines-init-variables,-cppcoreguidelines-macro-usage,-cppcoreguidelines-non-private-member-variables-in-classes,-cppcoreguidelines-pro-bounds-array-to-pointer-decay,-cppcoreguidelines-pro-bounds-constant-array-index,-cppcoreguidelines-pro-bounds-pointer-arithmetic,-cppcoreguidelines-pro-type-member-init,-cppcoreguidelines-pro-type-reinterpret-cast,-cppcoreguidelines-pro-type-union-access,-cppcoreguidelines-pro-type-vararg,-cppcoreguidelines-special-member-functions,-fuchsia-*,-google-explicit-constructor,-google-readability-braces-around-statements,-google-readability-todo,-google-runtime-int,-google-runtime-references,-hicpp-vararg,-hicpp-braces-around-statements,-hicpp-explicit-conversions,-hicpp-named-parameter,-hicpp-no-array-decay,-hicpp-avoid-c-arrays,-hicpp-signed-bitwise,-hicpp-special-member-functions,-hicpp-uppercase-literal-suffix,-hicpp-use-auto,-hicpp-use-equals-default,-hicpp-use-override,-llvm-header-guard,-llvm-include-order,-llvmlibc-restrict-system-libc-headers,-llvmlibc-callee-namespace,-llvmlibc-implementation-in-namespace,-llvm-else-after-return,-llvm-qualified-auto,-misc-misplaced-const,-misc-non-private-member-variables-in-classes,-misc-no-recursion,-modernize-avoid-bind,-modernize-avoid-c-arrays,-modernize-pass-by-value,-modernize-use-auto,-modernize-use-default-member-init,-modernize-use-equals-default,-modernize-use-trailing-return-type,-modernize-use-transparent-functors,-performance-unnecessary-value-param,-readability-braces-around-statements,-readability-else-after-return,-readability-function-cognitive-complexity,-readability-isolate-declaration,-readability-magic-numbers,-readability-named-parameter,-readability-uppercase-literal-suffix,-readability-convert-member-functions-to-static,-readability-qualified-auto,-readability-redundant-string-init,-bugprone-narrowing-conversions,-cppcoreguidelines-narrowing-conversions,-altera-struct-pack-align,-cppcoreguidelines-prefer-member-initializer
CMAKE_CXX_FLAGS:
instance should be built for all types!
adding instance device_avg_pool3d_bwd_instance
instance should be built for all types!
adding instance device_batched_gemm_instance
instance should be built for all types!
adding instance device_batched_gemm_add_relu_gemm_add_instance
instance should be built for all types!
adding instance device_batched_gemm_bias_permute_instance
instance should be built for all types!
adding instance device_batched_gemm_gemm_instance
instance should be built for all types!
Found only dl instances, but DL_KERNELS is not set. Skipping.
instance should be built for all types!
adding instance device_batched_gemm_reduce_instance
instance should be built for all types!
adding instance device_batched_gemm_softmax_gemm_instance
instance should be built for all types!
adding instance device_batched_gemm_softmax_gemm_permute_instance
instance should be built for all types!
adding instance device_batchnorm_instance
instance should be built for all types!
adding instance device_column_to_image_instance
instance should be built for all types!
adding instance device_contraction_bilinear_instance
instance should be built for all types!
adding instance device_contraction_scale_instance
instance should be built for all types!
adding instance device_conv1d_bwd_data_instance
instance should be built for all types!
adding instance device_conv2d_bwd_data_instance
removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f32_instance.cpp
removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_f16_instance.cpp
removing dl instance device_conv2d_bwd_data_dl_nhwc_kyxc_nhwk_int8_instance.cpp
instance should be built for all types!
adding instance device_conv2d_fwd_instance
instance should be built for all types!
adding instance device_conv2d_fwd_bias_relu_instance
instance should be built for all types!
adding instance device_conv2d_fwd_bias_relu_add_instance
instance should be built for all types!
adding instance device_conv3d_bwd_data_instance
instance should be built for all types!
adding instance device_elementwise_instance
instance should be built for all types!
adding instance device_elementwise_normalization_instance
instance should be built for all types!
adding instance device_gemm_instance
removing dl instance device_gemm_dl_f32_f32_f32_mk_kn_mn_instance.cpp
removing dl instance device_gemm_dl_f32_f32_f32_mk_nk_mn_instance.cpp
removing dl instance device_gemm_dl_f32_f32_f32_km_kn_mn_instance.cpp
removing dl instance device_gemm_dl_f32_f32_f32_km_nk_mn_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_mk_kn_mn_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_mk_kn_mn_irregular_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_mk_nk_mn_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_mk_nk_mn_irregular_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_km_kn_mn_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_km_kn_mn_irregular_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_km_nk_mn_instance.cpp
removing dl instance device_gemm_dl_f16_f16_f16_km_nk_mn_irregular_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_mk_kn_mn_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_mk_kn_mn_irregular_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_mk_nk_mn_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_mk_nk_mn_irregular_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_km_kn_mn_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_km_kn_mn_irregular_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_km_nk_mn_instance.cpp
removing dl instance device_gemm_dl_i8_i8_i8_km_nk_mn_irregular_instance.cpp
instance should be built for all types!
adding instance device_gemm_add_add_fastgelu_instance
instance should be built for all types!
adding instance device_gemm_add_fastgelu_instance
instance should be built for all types!
adding instance device_gemm_add_multiply_instance
instance should be built for all types!
adding instance device_gemm_add_relu_add_layernorm_instance
instance should be built for all types!
adding instance device_gemm_bias_add_reduce_instance
instance should be built for all types!
adding instance device_gemm_bilinear_instance
instance should be built for all types!
adding instance device_gemm_fastgelu_instance
instance should be built for all types!
adding instance device_gemm_multiply_add_instance
instance should be built for all types!
adding instance device_gemm_reduce_instance
instance should be built for all types!
adding instance device_gemm_splitk_instance
instance should be built for all types!
adding instance device_gemm_streamk_instance
instance should be built for all types!
adding instance device_grouped_conv1d_bwd_weight_instance
instance should be built for all types!
adding instance device_grouped_conv1d_fwd_instance
instance should be built for all types!
adding instance device_grouped_conv2d_bwd_data_instance
instance should be built for all types!
adding instance device_grouped_conv2d_bwd_weight_instance
instance should be built for all types!
adding instance device_grouped_conv2d_fwd_instance
removing dl instance dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f16_instance.cpp
removing dl instance dl/device_grouped_conv2d_fwd_dl_gnhwc_gkyxc_gnhwk_f32_instance.cpp
removing dl instance dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f16_instance.cpp
removing dl instance dl/device_grouped_conv2d_fwd_dl_nhwgc_gkyxc_nhwgk_f32_instance.cpp
instance should be built for all types!
adding instance device_grouped_conv3d_bwd_data_instance
instance should be built for all types!
adding instance device_grouped_conv3d_bwd_weight_instance
instance should be built for all types!
adding instance device_grouped_conv3d_fwd_instance
instance should be built for all types!
adding instance device_grouped_gemm_instance
instance should be built for all types!
adding instance device_grouped_gemm_bias_instance
instance should be built for all types!
adding instance device_grouped_gemm_fastgelu_instance
instance should be built for all types!
adding instance device_grouped_gemm_fixed_nk_instance
instance should be built for all types!
adding instance device_image_to_column_instance
instance should be built for all types!
adding instance device_max_pool_bwd_instance
instance should be built for all types!
adding instance device_normalization_instance
instance should be built for all types!
adding instance device_pool3d_fwd_instance
instance should be built for all types!
adding instance device_quantization_instance
removing dl instance conv2d_fwd/device_conv2d_dl_perlayer_quantization_int8_instance.cpp
removing dl instance conv2d_fwd/device_conv2d_dl_perchannel_quantization_int8_instance.cpp
removing dl instance conv2d_fwd/device_conv2d_dl_bias_perlayer_quantization_int8_instance.cpp
removing dl instance conv2d_fwd/device_conv2d_dl_bias_perchannel_quantization_int8_instance.cpp
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_km_kn_mn_instance.cpp
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_km_nk_mn_instance.cpp
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_mk_kn_mn_instance.cpp
removing dl instance gemm/device_gemm_quantization_dl_c_shuffle_i8_i8_i8_mk_nk_mn_instance.cpp
instance should be built for all types!
adding instance device_reduce_instance
instance should be built for all types!
adding instance device_softmax_instance
-- Configuring done (13.2s)
-- Generating done (0.3s)
CMake Warning:
Manually-specified variables were not used by the project:
BUILD_TESTING
CMAKE_EXPORT_NO_PACKAGE_REGISTRY
CMAKE_POLICY_DEFAULT_CMP0025
-- Build files have been written to: /build/source/build
cmake: enabled parallel building
cmake: enabled parallel installing
Running phase: buildPhase
build flags: -j24 SHELL=/nix/store/a1s263pmsci9zykm5xcdf7x9rv26w6d5-bash-5.2p26/bin/bash
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce1.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_gemm/CMakeFiles/device_batched_gemm_gemm_instance.dir/device_batched_gemm_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gmk_gkn_gmn_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_bias_permute/CMakeFiles/device_batched_gemm_bias_permute_instance.dir/device_batched_gemm_bias_permute_m2_n3_k1_xdl_c_shuffle_f16_f16_f16_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gmk_gkn_gmn_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm/CMakeFiles/device_batched_gemm_softmax_gemm_instance.dir/device_batched_gemm_softmax_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_kknn_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nhwc_1d_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_kkn_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f32_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu/CMakeFiles/device_conv2d_fwd_bias_relu_instance.dir/device_conv2d_fwd_xdl_c_shuffle_bias_relu_nhwc_kyxc_nhwk_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/CMakeFiles/device_batched_gemm_add_relu_gemm_add_instance.dir/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd_bias_relu_add/CMakeFiles/device_conv2d_fwd_bias_relu_add_instance.dir/device_conv2d_fwd_xdl_c_shuffle_bias_relu_add_nhwc_kyxc_nhwk_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/elementwise_normalization/CMakeFiles/device_elementwise_normalization_instance.dir/device_elementwise_normalization_f16_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/elementwise/CMakeFiles/device_elementwise_instance.dir/device_normalize_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_km_kn_mn_mn_mn_instance.cpp.o
[ 2%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_km_kn_mn_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_mk_kn_mn_instance.cpp.o
[ 4%] Built target device_elementwise_instance
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_mk_nk_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_bf16_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce2.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank3_reduce3.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_km_kn_mn_instance.cpp.o
[ 4%] Built target device_elementwise_normalization_instance
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f64_f64_f64_km_nk_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/avg_pool3d_bwd/CMakeFiles/device_avg_pool3d_bwd_instance.dir/device_avg_pool3d_bwd_ndhwc_f32_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nhwc_2d_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_knn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce1.cpp.o
[ 4%] Built target device_avg_pool3d_bwd_instance
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_mkn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gmk_gnk_gmn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gkm_gkn_gmn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_mk_kn_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_knnn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f32_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_gemm/CMakeFiles/device_batched_gemm_gemm_instance.dir/device_batched_gemm_gemm_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gon_gmo_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce2.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gmk_gnk_gmn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/column_to_image/CMakeFiles/device_column_to_image_instance.dir/device_column_to_image_nhwc_3d_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_add_relu_gemm_add/CMakeFiles/device_batched_gemm_add_relu_gemm_add_instance.dir/device_batched_gemm_add_relu_gemm_add_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gon_gmo_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce3.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_mnn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_mk_nk_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp.o
[ 4%] Built target device_batched_gemm_bias_permute_instance
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_bias_softmax_gemm_permute_xdl_cshuffle_f16_f16_f16_f16_gmk_gnk_gno_gmo_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f16_f16_instance_rank4_reduce4.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_f32_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_km_nk_mn_mn_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_km_nk_mn_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_mk_kn_mn_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_fastgelu/CMakeFiles/device_gemm_add_fastgelu_instance.dir/device_gemm_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_mk_nk_mn_mn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_mknn_instance.cpp.o
[ 4%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f32_f32_f32_f32_mnnn_instance.cpp.o
[ 6%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_bf16_instance.cpp.o
[ 6%] Built target device_column_to_image_instance
[ 6%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_kknn_instance.cpp.o
[ 6%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f16_f16_f16_gkm_gnk_gmn_instance.cpp.o
[ 8%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gkm_gkn_gmn_instance.cpp.o
[ 8%] Built target device_batched_gemm_gemm_instance
[ 10%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_knnn_instance.cpp.o
[ 12%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank3_reduce1.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_f32_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_km_kn_mn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_kkn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank3_reduce2.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gmk_gkn_gmn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank3_reduce3.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce1.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_forward_f64_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f16_instance.cpp.o
[ 14%] Built target device_batched_gemm_add_relu_gemm_add_instance
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f32_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_bf16_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f32_f32_f32_km_nk_mn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_mknn_instance.cpp.o
[ 14%] Built target device_conv2d_fwd_bias_relu_instance
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_bf16_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_add_fastgelu/CMakeFiles/device_gemm_add_add_fastgelu_instance.dir/device_gemm_add_add_fastgelu_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 14%] Built target device_conv2d_fwd_bias_relu_add_instance
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gmk_gnk_gmn_instance.cpp.o
[ 14%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_knn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_mkn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce2.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_reduce/CMakeFiles/device_batched_gemm_reduce_instance.dir/device_batched_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_gkm_gnk_gmn_instance.cpp.o
[ 16%] Built target device_batched_gemm_softmax_gemm_instance
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce3.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_bf16_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_scale/CMakeFiles/device_contraction_scale_instance.dir/device_contraction_scale_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_mnn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f16_instance.cpp.o
[ 16%] Built target device_gemm_add_fastgelu_instance
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_km_kn_mn_mn_mn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_km_nk_mn_mn_mn_instance.cpp.o
[ 16%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp.o
[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_mk_kn_mn_instance.cpp.o
[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/softmax/CMakeFiles/device_softmax_instance.dir/device_softmax_f32_f32_instance_rank4_reduce4.cpp.o
[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_km_kn_mn_mn_mn_instance.cpp.o
^[[A[ 18%] Building CXX object library/src/tensor_operation_instance/gpu/contraction_bilinear/CMakeFiles/device_contraction_bilinear_instance.dir/device_contraction_bilinear_m2_n2_k2_xdl_c_shuffle_f64_f64_f64_f64_mnnn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_km_nk_mn_mn_mn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_backward_f64_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gkm_gkn_gmn_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm_softmax_gemm_permute/CMakeFiles/device_batched_gemm_softmax_gemm_permute_instance.dir/device_batched_gemm_bias_softmax_gemm_permute_xdl_cshuffle_bf16_bf16_bf16_bf16_gmk_gnk_gno_gmo_instance.cpp.o
[ 20%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_fwd/CMakeFiles/device_conv2d_fwd_instance.dir/device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instance.cpp.o
[ 20%] Built target device_softmax_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv2d_bwd_data/CMakeFiles/device_conv2d_bwd_data_instance.dir/device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_int8_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_bf16_instance.cpp.o
[ 22%] Built target device_contraction_scale_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv3d_bwd_data/CMakeFiles/device_conv3d_bwd_data_instance.dir/device_conv3d_bwd_data_xdl_ndhwc_kzyxc_ndhwk_int8_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_mk_nk_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_mk_kn_mn_instance.cpp.o
[ 22%] Built target device_batched_gemm_reduce_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_mk_nk_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_multiply/CMakeFiles/device_gemm_add_multiply_instance.dir/device_gemm_add_multiply_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 22%] Built target device_contraction_bilinear_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_bf16_bf16_bf16_gkm_gnk_gmn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/conv1d_bwd_data/CMakeFiles/device_conv1d_bwd_data_instance.dir/device_conv1d_bwd_data_xdl_nwc_kxc_nwk_int8_instance.cpp.o
[ 22%] Built target device_gemm_add_add_fastgelu_instance
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_km_kn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f16_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_km_kn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_add_relu_add_layernorm/CMakeFiles/device_gemm_add_relu_add_layernorm_instance.dir/device_gemm_add_relu_add_xdl_c_shuffle_layernorm_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f32_f32_f32_km_nk_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f32_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_kn_mn_instance.cpp.o
^[[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_bf16_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_km_kn_mn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_km_nk_mn_mn_instance.cpp.o
[ 22%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gmk_gkn_gmn_instance.cpp.o
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/batchnorm/CMakeFiles/device_batchnorm_instance.dir/device_batchnorm_infer_f64_instance.cpp.o
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp.o
[ 24%] Built target device_batchnorm_instance
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bias_add_reduce/CMakeFiles/device_gemm_bias_add_reduce_instance.dir/device_gemm_bias_add_mean_squaremean_xdl_cshuffle_f16_f16_f16_f32_f32_km_nk_mn_instance.cpp.o
[ 24%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 24%] Built target device_conv2d_fwd_instance
[ 26%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_mk_kn_mn_mn_instance.cpp.o
[ 26%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 26%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 28%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_fastgelu/CMakeFiles/device_gemm_fastgelu_instance.dir/device_gemm_fastgelu_xdl_c_shuffle_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 28%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gmk_gnk_gmn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f16_f16_f16_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_xdl_c_shuffle_f16_f16_f16_f16_mk_nk_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_km_kn_mn_mn_instance.cpp.o
[ 30%] Built target device_conv2d_bwd_data_instance
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_km_nk_mn_mn_instance.cpp.o
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_mk_kn_mn_mn_instance.cpp.o
[ 30%] Built target device_gemm_add_multiply_instance
[ 30%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_bilinear/CMakeFiles/device_gemm_bilinear_instance.dir/device_gemm_bilinear_wmma_c_shuffle_i8_i8_i8_i8_mk_nk_mn_mn_instance.cpp.o
[ 30%] Built target device_conv3d_bwd_data_instance
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f8_f32_f32_f16_mk_kn_mn_mn_mn_instance.cpp.o
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_multiply_add/CMakeFiles/device_gemm_multiply_add_instance.dir/device_gemm_multiply_add_xdl_c_shuffle_f16_f8_f32_f32_f16_mk_nk_mn_mn_mn_instance.cpp.o
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gkm_gkn_gmn_instance.cpp.o
[ 32%] Built target device_conv1d_bwd_data_instance
[ 32%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_kn_mn_irregular_instance.cpp.o
[ 32%] Built target device_gemm_bias_add_reduce_instance
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_km_nk_mn_irregular_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_kn_mn_irregular_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_f32_f32_f32_gkm_gnk_gmn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_dpp_f16_f16_f16_mk_nk_mn_irregular_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_mk_nk_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_streamk/CMakeFiles/device_gemm_streamk_instance.dir/device_gemm_xdl_streamk_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_mk_nk_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gmk_gkn_gmn_instance.cpp.o
[ 34%] Built target device_gemm_fastgelu_instance
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_km_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_reduce/CMakeFiles/device_gemm_reduce_instance.dir/device_gemm_reduce_xdl_cshuffle_f16_f16_f16_f32_f32_km_nk_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gmk_gnk_gmn_instance.cpp.o
[ 34%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_km_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f32_f32_f32_km_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight/CMakeFiles/device_grouped_conv1d_bwd_weight_instance.dir/device_grouped_conv1d_bwd_weight_xdl_gnwc_gkxc_gnwk_f16_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight/CMakeFiles/device_grouped_conv1d_bwd_weight_instance.dir/device_grouped_conv1d_bwd_weight_xdl_gnwc_gkxc_gnwk_f32_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_km_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gkm_gkn_gmn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/batched_gemm/CMakeFiles/device_batched_gemm_instance.dir/device_batched_gemm_xdl_int8_int8_int8_gkm_gnk_gmn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_km_kn_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 36%] Built target device_gemm_multiply_add_instance
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_mk_kn_mn_instance.cpp.o
[ 36%] Built target device_gemm_bilinear_instance
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_mk_nk_mn_instance.cpp.o
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 36%] Built target device_gemm_add_relu_add_layernorm_instance
[ 36%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_f16_f16_f16_km_nk_mn_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f16_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp.o
[ 38%] Built target device_gemm_streamk_instance
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp.o
[ 38%] Built target device_gemm_reduce_instance
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_bwd_weight/CMakeFiles/device_grouped_conv1d_bwd_weight_instance.dir/device_grouped_conv1d_bwd_weight_xdl_gnwc_gkxc_gnwk_bf16_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_c_shuffle_2_stage_f16_f16_f16_mk_nk_mn_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_add_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_km_kn_mn_instance.cpp.o
[ 38%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp.o
[ 40%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v1_instance.cpp.o
[ 40%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_f32_instance.cpp.o
[ 40%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 40%] Built target device_batched_gemm_softmax_gemm_permute_instance
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v2_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv1d_fwd/CMakeFiles/device_grouped_conv1d_fwd_instance.dir/xdl/device_grouped_conv1d_fwd_xdl_gnwc_gkxc_gnwk_int8_instance.cpp.o
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp.o
[ 42%] Built target device_batched_gemm_instance
[ 42%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/xdl/device_grouped_conv2d_bwd_data_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp.o
[ 44%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_gnhwc_gkyxc_gnhwk_f16_1x1s1p0_instance.cpp.o
[ 44%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_fp8_f16_f16_km_nk_mn_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_mk_kn_mn_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_bf16_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_gnhwc_gkyxc_gnhwk_f32_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_mk_nk_mn_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_nhwgc_gkyxc_nhwgk_f16_1x1s1p0_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_gnhwc_gkyxc_gnhwk_i8_1x1s1p0_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_nhwgc_gkyxc_nhwgk_i8_1x1s1p0_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_default_pipeline_v2_opt_instance.cpp.o
[ 46%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_interwave_pipeline_v1_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp.o
[ 48%] Built target device_grouped_conv1d_bwd_weight_instance
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_nhwgc_gkyxc_nhwgk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_gnhwc_gkyxc_gnhwk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_km_kn_mn_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm_splitk/CMakeFiles/device_gemm_splitk_instance.dir/device_gemm_xdl_splitk_f16_fp8_f16_km_nk_mn_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_default_pipeline_v1_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_default_pipeline_v2_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight/CMakeFiles/device_grouped_conv3d_bwd_weight_instance.dir/device_grouped_conv3d_bwd_weight_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_data/CMakeFiles/device_grouped_conv2d_bwd_data_instance.dir/wmma/device_grouped_conv2d_bwd_data_wmma_nhwgc_gkyxc_nhwgk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_kn_mn_irregular_interwave_pipeline_v1_instance.cpp.o
[ 48%] Built target device_grouped_conv1d_fwd_instance
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_add_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight/CMakeFiles/device_grouped_conv3d_bwd_weight_instance.dir/device_grouped_conv3d_bwd_weight_xdl_gndhwc_gkzyxc_gndhwk_f32_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_data/CMakeFiles/device_grouped_conv3d_bwd_data_instance.dir/xdl/device_grouped_conv3d_bwd_data_xdl_ndhwgc_gkzyxc_ndhwgk_f16_instance.cpp.o
[ 48%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v1_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v2_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd/CMakeFiles/device_grouped_conv2d_fwd_instance.dir/xdl/device_grouped_conv2d_fwd_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_bwd_weight/CMakeFiles/device_grouped_conv3d_bwd_weight_instance.dir/device_grouped_conv3d_bwd_weight_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_nhwgc_gkyxc_nhwgk_f32_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv2d_bwd_weight/CMakeFiles/device_grouped_conv2d_bwd_weight_instance.dir/device_grouped_conv2d_bwd_weight_xdl_nhwgc_gkyxc_nhwgk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_bf16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd/CMakeFiles/device_grouped_conv3d_fwd_instance.dir/xdl/device_grouped_conv3d_fwd_xdl_gndhwc_gkzyxc_gndhwk_f16_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_default_pipeline_v2_opt_instance.cpp.o
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/grouped_gemm/CMakeFiles/device_grouped_gemm_instance.dir/device_grouped_gemm_xdl_f16_f16_f16_mk_kn_mn_instance.cpp.o
[ 50%] Built target device_gemm_splitk_instance
[ 50%] Building CXX object library/src/tensor_operation_instance/gpu/gemm/CMakeFiles/device_gemm_instance.dir/device_gemm_xdl_f16_f16_f16/km_nk_mn_interwave_pipeline_v1_instance.cpp.o
^Cerror: interrupted by the user That's completely different from the build from my dev branch. |
@errnoh that's likely just some transitive dep which hydra simply hasn't built yet. Master is not guaranteed to be in the binary cache; it's the development trunk basically. Use a channel such as nixpkgs-unstable or nixos-unstable as your base; that's good enough usually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur with @SomeoneSerge, this is working for one real-world use-case already; let's ship it.
Some minor nits.
Also please squash the commits into logical units and title them according to the contributor's manual.
I linked #301937 but actually it's a separate issue: Hydra discards |
I can run a nixpkgs-review tomorrow |
Result of 1 package built:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view this looks good. I'm not sure how I can test it more than building but others seems to have done that.
So after some squashing this should be good to go
Newest version with minor changes based on previous round of comments. Now squashed into a single commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with import (builtins.getFlake github:errnoh/nixpkgs/add-zluda) { config.allowUnfree = true; };
let
saxpy' = runCommand "saxpy-zluda" { nativeBuildInputs = [ makeWrapper ]; } ''
mkdir "$out/bin" -p
args=(
makeWrapper
${lib.getExe cudaPackages.saxpy}
"$out/bin/$name"
--prefix LD_LIBRARY_PATH : "${lib.getLib zluda}/lib"
)
"''${args[@]}"
'';
in
(singularity-tools.buildImage rec {
name = "zluda";
contents = [
cudaPackages.saxpy
saxpy'
];
diskSize = 1024 * 64;
memSize = diskSize;
}) // {
passthru = { inherit saxpy'; };
}
❯ nom build -f zluda.nix
❯ rsync -LP zluda.sif lumi.csc.fi:proj-nixpkgs/
❯ ssh lumi.csc.fi srun --account=project_$lumi_project --partition=small-g --ntasks=1 --gpus-per-node=1 --time=00:05:00 singularity exec proj-nixpkgs/zluda.sif saxpy-zluda
srun: job 6896251 queued and waiting for resources
srun: job 6896251 has been allocated resources
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
Start
Runtime version: 12020
Driver version: 12020
Host memory initialized, copying to the device
Scheduled a cudaMemcpy, calling the kernel
Scheduled a kernel call
Max error: 0.000000
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/4 |
It looks like Looks like hydra basically only sucessfully built Not sure what the proper way to handle this is, but I'm guessing I should probably open a PR to mark this as broken right? |
There's currently PR ( #383757 ) on
its way that'll likely fix that :) It also removes the ugly cargo mess from
the package.
…On Sat, 22 Feb 2025, 13:05 Martin Schwaighofer, ***@***.***> wrote:
It looks like Looks like hydra basically only sucessfully built zluda
once after it was merged, and has been broken ever since
<https://hydra.nixos.org/job/nixos/trunk-combined/nixpkgs.zluda.x86_64-linux/all?page=2>
.
Not sure what the proper way to handle this is, but I'm guessing I should
probably open a PR to mark this as broken right?
—
Reply to this email directly, view it on GitHub
<#288644 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC3JOXVJQGQW6BTIU6OKQT2RBKWXAVCNFSM6AAAAABDHC6XXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZWGE2DQOBRGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: mschwaig]*mschwaig* left a comment (NixOS/nixpkgs#288644)
<#288644 (comment)>
It looks like Looks like hydra basically only sucessfully built zluda
once after it was merged, and has been broken ever since
<https://hydra.nixos.org/job/nixos/trunk-combined/nixpkgs.zluda.x86_64-linux/all?page=2>
.
Not sure what the proper way to handle this is, but I'm guessing I should
probably open a PR to mark this as broken right?
—
Reply to this email directly, view it on GitHub
<#288644 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC3JOXVJQGQW6BTIU6OKQT2RBKWXAVCNFSM6AAAAABDHC6XXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZWGE2DQOBRGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Description of changes
Goal is to provide package for ZLUDA ( #288392 ), letting you run unmodified CUDA applications with AMD GPUs.
See comment below for current issue blocking the build.
In addition: as this is only providing /lib contents, should this be namedlibzluda
or similar?Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.