Support rusticl runtime #276

chocolate42 · 2024-02-18T14:35:32Z

I'm trying to run gpuowl on rusticl instead of rocm, it fails with this:

[f40@p95 gpuowl]$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 ./build-debug/gpuowl -prp 77936867
20240218 14:10:06  GpuOwl VERSION v7.5-2-gba227ce
20240218 14:10:06  GpuOwl VERSION v7.5-2-gba227ce
20240218 14:10:06  config: -prp 77936867 
20240218 14:10:06  device 0, unique id ''
20240218 14:10:06 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20240218 14:10:06 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20240218 14:10:06 77936867 ASM compilation failed, retrying compilation using NO_ASM
20240218 14:10:06 77936867 OpenCL compilation error -11 (args -DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only  -DNO_ASM=1)
20240218 14:10:06 77936867 input.cl:44:26: warning: unsupported OpenCL extension 'cl_khr_int64_base_atomics' - ignoring [-Wignored-pragmas]
input.cl:1494:29: error: call to 'atom_add' is ambiguous
Error executing LLVM compilation action.

20240218 14:10:06  Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at src/clwrap.cpp:245 build
20240218 14:10:06  Bye

gpuowl.cl says this:

// 64-bit atomics used in kernel sum64
// If 64-bit atomics aren't available, sum64() can be implemented with 32-bit
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable

and the sum64 function looks simple enough:

KERNEL(256) sum64(global ulong* out, u32 sizeBytes, global ulong* in) {
  if (get_global_id(0) == 0) { out[0] = 0; }
  
  ulong sum = 0;
  for (i32 p = get_global_id(0); p < sizeBytes / sizeof(u64); p += get_global_size(0)) {
    sum += in[p];
  }
  sum = work_group_reduce_add(sum);
  if (get_local_id(0) == 0) { atom_add(&out[0], sum); }
}

If implementing sum64 with 32 bit atomics and sorting the atom_add ambiguity (which appears to be related) is all it takes to get gpuowl working on rusticl then cool. Rusticl is cross-platform and built into mesa (should be in OOTB for Ubuntu 24.04) and should be the way forwards for opencl on Linux (also mfakto is quicker under rusticl, for my hardware). I have no idea if atomic 64 bit int will ever be supported with rusticl, or if there are other hidden or vendor issues to be uncovered (can only test with RDNA3 iGPU 780M).

The text was updated successfully, but these errors were encountered:

chocolate42 · 2024-02-19T12:03:23Z

I'm having a play trying to get this to work, but as I don't know OpenCL or why sum64 is used this is a crapshoot. My initial read was that the extension was needed for 64 bit ints to work at all but it appears to just be needed for atom_add. And atom_add is just used to return the sum value to the host, seemingly atomic in this context ensures host and device are in sync. So I tried replacing with the 32 bit version which should be present always?:

KERNEL(256) sum64(global ulong* out, u32 sizeBytes, global ulong* in) {
  if (get_global_id(0) == 0) { out[0] = 0; }
  
  ulong sum = 0;
  for (i32 p = get_global_id(0); p < sizeBytes / sizeof(u64); p += get_global_size(0)) {
    sum += in[p];
  }
  sum = work_group_reduce_add(sum);
  if (get_local_id(0) == 0) {
    uint hi = (sum>>32)&0xFFFFFFFF;
    uint lo = sum&0xFFFFFFFF;
    global uint* recast = (global uint*) (&out[0]);
    atomic_add(recast, lo);
    atomic_add(recast+1, hi);
  }
}

Which ran, directly into a rusticl panic. Probably this is some deep syntax crime anyway, at least splitting into multiple atomic ops seems bad form:

[f40@p95 gpuowl]$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 RUST_BACKTRACE=full ./build-debug/gpuowl -prp 77936867
20240219 11:33:26  GpuOwl VERSION ba227ce-dirty
20240219 11:33:26  GpuOwl VERSION ba227ce-dirty
20240219 11:33:26  config: -prp 77936867 
20240219 11:33:26  device 0, unique id ''
20240219 11:33:26 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20240219 11:33:26 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20240219 11:33:26 77936867 ASM compilation failed, retrying compilation using NO_ASM
thread '<unnamed>' panicked at ../mesa-23.3.5/src/gallium/frontends/rusticl/core/program.rs:260:13:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0:     0x7f54dc3cf69c - <unknown>
   1:     0x7f54dc3f0230 - <unknown>
   2:     0x7f54dc3cd0dd - <unknown>
   3:     0x7f54dc3cf485 - <unknown>
   4:     0x7f54dc3d0a93 - <unknown>
   5:     0x7f54dc3d07ca - <unknown>
   6:     0x7f54dc3d0fc5 - <unknown>
   7:     0x7f54dc3d0e79 - <unknown>
   8:     0x7f54dc3cfb56 - <unknown>
   9:     0x7f54dc3d0c32 - <unknown>
  10:     0x7f54db46beb5 - <unknown>
  11:     0x7f54db46bf53 - <unknown>
  12:     0x7f54dc2d2ab0 - <unknown>
  13:     0x7f54dc35590b - <unknown>
  14:     0x7f54dc2f0e88 - <unknown>
  15:     0x7f54dc354cfe - <unknown>
  16:     0x7f54dc358f97 - <unknown>
  17:     0x7f54dc30d8f3 - <unknown>
  18:     0x7f54dc30d326 - <unknown>
  19:     0x7f54dd4d1bf3 - clBuildProgram
  20:     0x557c74db6b34 - build
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/clwrap.cpp:227:25
  21:     0x557c74db7e38 - _Z7compileP11_cl_contextP13_cl_device_idRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESA_RKSt6vectorIS8_SaIS8_EE
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/clwrap.cpp:275:10
  22:     0x557c74dac717 - compile
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:317:22
  23:     0x557c74dacb8d - _ZN3GpuC2ERK4ArgsjjjjjjP13_cl_device_idbbO7Weights
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:350:3
  24:     0x557c74daedf0 - _ZN3GpuC2ERK4ArgsjjjjjjP13_cl_device_idbb
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:330:107
  25:     0x557c74db5ada - _ZSt11make_uniqueI3GpuJRK4ArgsRjS4_jS4_S4_S4_P13_cl_device_idRbS7_EENSt8__detail9_MakeUniqIT_E15__single_objectEDpOT0_
                               at /usr/include/c++/13.2.1/bits/unique_ptr.h:1070:30
  26:     0x557c74daf113 - _ZN3Gpu4makeEjRK4Args
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:595:76
  27:     0x557c74dbaaec - _ZN4Task7executeERK4Args
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Task.cpp:174:38
  28:     0x557c74da3ce0 - main
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/main.cpp:56:51
  29:     0x7f54dcf76cd0 - <unknown>
  30:     0x7f54dcf76d8a - __libc_start_main
  31:     0x557c74d985a5 - _start
  32:                0x0 - <unknown>
20240219 11:33:26  Unexpected exception
fatal runtime error: Rust panics must be rethrown
Aborted (core dumped)

But why do we need atomic at all right? if (get_local_id(0) == 0) { out[0] = sum; }. Well it still panics. So how about just ignoring the result of sum64. It's only used in vector<u32> Gpu::readAndCompress(ConstBuffer<int>& buf) from Gpu.cpp, and it's immediately checked against the CPU doing the sum. So sod it by removing the sum64 call and faking the result in readAndCompress() with:

    expectedSum = sum;
    if (sum != expectedSum || (allZero && nRetry == 0)) {

And just to be safe delete the return value in sum64 in case syntax messes anything up. That should completely bypass sum64 execution, unless it's being called somehow from other kernels not by a name that I can grep. However still a panic:

[f40@p95 gpuowl]$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 RUST_BACKTRACE=full ./build-debug/gpuowl -prp 77936867 -use NO_ASM
20240219 11:44:35  GpuOwl VERSION ba227ce-dirty
20240219 11:44:35  GpuOwl VERSION ba227ce-dirty
20240219 11:44:35  config: -prp 77936867 -use NO_ASM 
20240219 11:44:35  device 0, unique id ''
20240219 11:44:35 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20240219 11:44:35 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,} -DNO_ASM=1  -cl-std=CL2.0 -cl-finite-math-only "
thread '<unnamed>' panicked at ../mesa-23.3.5/src/gallium/frontends/rusticl/core/program.rs:260:13:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0:     0x7fc80997a69c - <unknown>
   1:     0x7fc80999b230 - <unknown>
   2:     0x7fc8099780dd - <unknown>
   3:     0x7fc80997a485 - <unknown>
   4:     0x7fc80997ba93 - <unknown>
   5:     0x7fc80997b7ca - <unknown>
   6:     0x7fc80997bfc5 - <unknown>
   7:     0x7fc80997be79 - <unknown>
   8:     0x7fc80997ab56 - <unknown>
   9:     0x7fc80997bc32 - <unknown>
  10:     0x7fc808a16eb5 - <unknown>
  11:     0x7fc808a16f53 - <unknown>
  12:     0x7fc80987dab0 - <unknown>
  13:     0x7fc80990090b - <unknown>
  14:     0x7fc80989be88 - <unknown>
  15:     0x7fc8098ffcfe - <unknown>
  16:     0x7fc809903f97 - <unknown>
  17:     0x7fc8098b88f3 - <unknown>
  18:     0x7fc8098b8326 - <unknown>
  19:     0x7fc80aa7cbf3 - clBuildProgram
  20:     0x55972f405a5f - build
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/clwrap.cpp:222:27
  21:     0x55972f406e2a - _Z7compileP11_cl_contextP13_cl_device_idRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESA_RKSt6vectorIS8_SaIS8_EE
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/clwrap.cpp:275:10
  22:     0x55972f3fb709 - compile
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:317:22
  23:     0x55972f3fbb7f - _ZN3GpuC2ERK4ArgsjjjjjjP13_cl_device_idbbO7Weights
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:350:3
  24:     0x55972f3fdde2 - _ZN3GpuC2ERK4ArgsjjjjjjP13_cl_device_idbb
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:330:107
  25:     0x55972f404acc - _ZSt11make_uniqueI3GpuJRK4ArgsRjS4_jS4_S4_S4_P13_cl_device_idRbS7_EENSt8__detail9_MakeUniqIT_E15__single_objectEDpOT0_
                               at /usr/include/c++/13.2.1/bits/unique_ptr.h:1070:30
  26:     0x55972f3fe105 - _ZN3Gpu4makeEjRK4Args
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Gpu.cpp:595:76
  27:     0x55972f409ade - _ZN4Task7executeERK4Args
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/Task.cpp:174:38
  28:     0x55972f3f2ce0 - main
                               at /home/f40/distrobox/rusticl-test/gpuowl/src/main.cpp:56:51
  29:     0x7fc80a521cd0 - <unknown>
  30:     0x7fc80a521d8a - __libc_start_main
  31:     0x55972f3e75a5 - _start
  32:                0x0 - <unknown>
20240219 11:44:35  Unexpected exception
fatal runtime error: Rust panics must be rethrown
Aborted (core dumped)

That leads me to think that the runtime panics are entirely unrelated to sum64 hackery? They've only been exposed now because the hackery let the kernel compile.

Sorry for being verbose, seemed sensible to let you know how I'm stabbing in the dark as there's a good chance I'm misunderstanding how sum64 is used and crimes are being committed.

preda · 2024-02-19T20:19:09Z

Thanks, attempting to compile with rusticl is an useful exercise. And it's a good approach to simplify/remove the initial trouble bits (sum64) like you did just to get it to compile.

It seems what we hit now may be a rusticl bug. We don't have the rusticl stack-trace symbols, but there is a line# and we know it tries to unwrap a None.
./mesa-23.3.5/src/gallium/frontends/rusticl/core/program.rs:260:13:
called Option::unwrap() on a None value

chocolate42 · 2024-02-20T12:20:04Z

Probably there is a bug in the fp64 implementation, it's still experimental after all. I'll look into it ability permitting and add an issue to rusticl's tracker if appropriate. Probably need to learn how to compile and use mesa first, if they reply I should be in a position to respond and test.

Actually the more I look at it the more I think it was premature to test fp64. It's available behind a flag but it seems that's for implementers to be able to test it as they implement, it's described as "in-progress" on their tracker: https://mesamatrix.net/

Have been peeking at rusticl's code to see how hard it would be to implement the extension, but I'm struggling to even find atomic_add right now. I'm thinking that Core CL functionality might not be done via rust/rusticl but directly in spirv or something (unclear). All I can find in the rust codebase is exposing the 32 bit atomic support not an implementation.

int64 atomics is one of the few extensions that clover supports that rusticl doesn't. Extensions have been implemented on rusticl on a priority basis, maybe very few programs make use of it and it's just low priority. But maybe we'll get lucky and it'll have higher priority just because they want to fully succeed clover.

preda · 2024-02-20T21:00:49Z

Which rusticl version are you using?
if under Ubuntu, could be obtained as the version of the package mesa-opencl-icl.
It might also be obtained with "gpuowl -h" or clinfo as the driver version.

I'm using rusticl 23.2.1-1ubuntu3.1~22.04.2 and it does not produce the nice error messages you're seeing :) (it fails, but I can't see where)

chocolate42 · 2024-02-21T00:51:42Z

Mesa 23.3.5 the latest arch packages. But I'm not even on arch host this is through a temp arch environment with distrobox which you could do too. It's as simple as distrobox create --name whatever --image archlinux, distrobox enter whatever, then you're in a shell of a base arch environment that you update install rusticl etc as normal. Much less painful than installing all these different often conflicting toolchains direct to host or in a vm.

chocolate42 · 2024-02-22T13:17:21Z

This is with the latest sources:

📦[f40@f39 f39]$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ../mesabuild ../f39/gpuowl/build-debug/gpuowl -prp 77936867
20240222 12:49:34  GpuOwl VERSION v7.5-6-gd522826
20240222 12:49:34  GpuOwl VERSION v7.5-6-gd522826
20240222 12:49:34  config: -prp 77936867 
20240222 12:49:34  device 0, unique id '', driver '24.1.0-devel'
20240222 12:49:34 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20240222 12:49:34 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20240222 12:49:34 77936867 ASM compilation failed, retrying compilation using NO_ASM
SPIR-V parsing FAILED:
    In file ../mesa-main-2024-02-22/src/compiler/spirv/vtn_variables.c:2341
    Initializer for CrossWorkgroup variable 3 not yet supported in Mesa.
    164196 bytes into the SPIR-V binary

which corresponds to this in vtn_variables.c:

      case SpvStorageClassCrossWorkgroup:
         vtn_assert(b->options->environment == NIR_SPIRV_OPENCL);
         vtn_fail("Initializer for CrossWorkgroup variable %u "
                  "not yet supported in Mesa.",
                  vtn_id_for_value(b, val));
         break;

Possibly I didn't use correct compiler settings to build mesa, possibly this is just where rusticl is at.

Here's the gist of how I compiled mesa

# mesa source in mesa-main-2024-02-22
mkdir mesabuild
cd mesabuild
# specific to amd
meson setup ../mesa-main-2024-02-22 --libdir lib64 -Dgallium-rusticl=true -Dllvm=enabled -Drust_std=2021 -Dgallium-drivers=radeonsi,swrast
meson compile -C ./
# mesa built to mesabuild

# to use mesabuild
RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C wherever/mesabuild relative/to/mesabuild/dir/gpuowl/build-debug/gpuowl -prp 77936867

To do this yourself you'll need recent toolchains, at least meson 1.3.1 so either compile youself or take the easy route and build in a Fedora 39 environment which has a recent enough meson. If in a F39 environment you can also get most of the build dependencies easily by installing the builddep plugin to dnf sudo dnf install "dnf-command(builddep)" followed by sudo dnf builddep mesa which installs all dependencies that were needed to build the mesa in the fedora repo.

chocolate42 · 2024-02-29T18:01:15Z

If I'm interpreting things correctly, rusticl doesn't yet support the opencl 2.0 extension that adds things like work_group_reduce_add()

https://registry.khronos.org/OpenCL/sdk/3.0/docs/man/html/workGroupFunctions.html

Corresponding to this feature on the tracker:

https://mesamatrix.net/#RusticlOpenCL2.0_Extension__Workgroup_Collective_Functions

preda · 2024-04-26T07:54:38Z

This is in standby from my POV, maybe in 6 months we'll have another go.

chocolate42 · 2024-07-21T10:55:49Z

Tried this again with latest mesa. Now meson 1.4.0+ is required to build mesa, which I got by upgrading to Fedora 40 (alternatively build meson from source). Also needed to add PyYAML (pip install PyYAML) and cbindgen (dnf install cbindgen) to the environment I used before to get mesa to successfully compile.

20240721 11:50:22  GpuOwl VERSION v7.5-8-gb400b88
20240721 11:50:22  GpuOwl VERSION v7.5-8-gb400b88
20240721 11:50:22  config: -prp 77936867 
20240721 11:50:23  device 0, unique id '', driver '24.3.0-devel'
20240721 11:50:23 77936867 FFT: 4M 1K:8:256 (18.58 bpw)
20240721 11:50:23 77936867 OpenCL args "-DEXP=77936867u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=8u -DAMDGPU=1 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DWEIGHT_STEP=0.33644726404543274 -DIWEIGHT_STEP=-0.25174750481886216 -DIWEIGHTS={0,-0.44011820345520131,-0.37306474779553728,-0.29798072935699788,-0.21390437908665341,-0.11975874301407295,-0.014337887291734644,-0.44814572555075455,} -DFWEIGHTS={0,0.78609128957452257,0.5950610473469905,0.42446232150303748,0.2721098723818392,0.1360521812214803,0.014546452690911484,0.81207258201996746,}  -cl-std=CL2.0 -cl-finite-math-only "
20240721 11:50:23 77936867 ASM compilation failed, retrying compilation using NO_ASM
SPIR-V parsing FAILED:
    In file ../mesa-main-2024-07-21/mesa-main/src/compiler/spirv/vtn_variables.c:2354
    Initializer for CrossWorkgroup variable 3 not yet supported in Mesa.
    164196 bytes into the SPIR-V binary

Same blocker as before. Will try again in a few months.

oscarbg · 2024-12-05T19:59:33Z

that's what I'm getting now with latest rusticl from Mesa 25.0 master branch..

env |grep RUSTICL
RUSTICL_FEATURES=fp16,fp64
RUSTICL_ENABLE=radeonsi

note strange mention of "openclon12":

In file included from /tmp/openclon12/fftp.cl:6:
/tmp/openclon12/weight.cl:40:38: error: invalid output constraint '=v' in asm
Error executing LLVM compilation action.

full log:

 ./prpll -iters 1000 -prp 57885161 
20241205 20:57:42  PRPLL 0.15-64-gc542395 starting
20241205 20:57:42  config: -iters 1000 -prp 57885161 
20241205 20:57:42  device 0, OpenCL 25.0~git2412050600.e26a38~oibaf~o (git-e26a383 2024-12-05 oracular-oibaf-ppa), unique id ''
20241205 20:57:42 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241205 20:57:42 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241205 20:57:42 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241205 20:57:42 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241205 20:57:42 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241205 20:57:42 57885161 config: 
20241205 20:57:42 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241205 20:57:42 57885161 In file included from input.cl:1:
In file included from /tmp/openclon12/fftp.cl:6:
/tmp/openclon12/weight.cl:40:38: error: invalid output constraint '=v' in asm
Error executing LLVM compilation action.

20241205 20:57:42 57885161 Compiling 'fftp.cl' error COMPILE_PROGRAM_FAILURE (-15) (args -cl-finite-math-only -cl-std=CL2.0  -DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DCARRY_LEN=8u -DNW=4u -DNH=4u -DAMDGPU=1 -DWEIGHT_STEP=0.51445938099070077 -DIWEIGHT_STEP=-0.33969836857173508 -DFFT_VARIANT=0u -DTAILT=U2(-7.5298160855459062e-05,0.012271538285719925) -DTRIG_SCALE=3 -DTRIG_SIN={1.331580545039619e-06,7.4418738109161865e-23,-3.9350574163656679e-19,3.4886384292170233e-32,-1.472792461108135e-45,3.6269691064102593e-59,-5.8460626541768332e-73,6.5788131861971713e-87,} -DTRIG_COS={1,-8.8655337396400457e-13,1.3099614748116047e-25,-7.7423384350554752e-39,2.451427236886687e-52,-4.8296023716962189e-66,6.4872905043230845e-80,-6.2756334651618034e-94,} -DFRAC_BPW_HI=1723128490u -DFRAC_BPW_LO=2863311529u -DFRAC_BITS_BIGSTEP=1073741823u )
20241205 20:57:42 57885161 Can't compile fftp.cl
20241205 20:57:42  Exception "Can't compile fftp.cl"
20241205 20:57:42  Bye

preda · 2024-12-06T22:40:46Z

@oscarbg Could you please try with "-use NO_ASM" ?
./prpll -iters 1000 -prp 57885161 -use NO_ASM

that should disable our use of __asm() in case that's producing the above error.

oscarbg · 2024-12-07T16:23:24Z

@preda
seems better.. but still get fails.. note I'm on a Zen4 iGPU (RDNA2)..

 ./prpll -iters 1000 -prp 57885161 -use NO_ASM
20241207 17:17:48  PRPLL 0.15-64-gc542395 starting
20241207 17:17:48  config: -iters 1000 -prp 57885161 -use NO_ASM 
20241207 17:17:50  device 0, OpenCL 25.0~git2412070600.2aae00~oibaf~o (git-2aae000 2024-12-07 oracular-oibaf-ppa), unique id ''
20241207 17:17:50 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241207 17:17:50 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241207 17:17:50 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241207 17:17:50 57885161 config:  -DNO_ASM=1
20241207 17:17:50 57885161 FFT: 3M 512:12:256:0:0 (18.40 bpw)
20241207 17:18:11 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241207 17:18:11 57885161 Proof of power 9 requires about 3.6GB of disk space
20241207 17:18:26 57885161 Stopping, please wait..
20241207 17:18:48 57885161 EE      1000 81a1b2c933d40475 18404 ETA 12d 07:55; Z=27 (avg 26.6) 1 errors
20241207 17:18:48  Exception "stop requested"
20241207 17:18:48  Bye

 ./prpll -iters 100000 -prp 57885161 -use NO_ASM
20241207 17:19:03  PRPLL 0.15-64-gc542395 starting
20241207 17:19:03  config: -iters 100000 -prp 57885161 -use NO_ASM 
20241207 17:19:04  device 0, OpenCL 25.0~git2412070600.2aae00~oibaf~o (git-2aae000 2024-12-07 oracular-oibaf-ppa), unique id ''
20241207 17:19:04 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241207 17:19:04 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241207 17:19:04 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241207 17:19:04 57885161 config:  -DNO_ASM=1
20241207 17:19:04 57885161 FFT: 3M 512:12:256:0:0 (18.40 bpw)
20241207 17:19:25 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241207 17:19:25 57885161 Proof of power 9 requires about 3.6GB of disk space
20241207 17:20:02 57885161 GPU read failed: 080855e301fe0722 (gpu) != 1b007c21189f52d8 (host)
20241207 17:20:02 57885161 GPU read failed: 080855e301fe0722 (gpu) != 1b007c21189f52d8 (host)
20241207 17:20:02 57885161 GPU read failed: 080855e301fe0722 (gpu) != 1b007c21189f52d8 (host)
20241207 17:20:02  Exception "GPU persistent read errors"
20241207 17:20:02  Bye

./prpll -iters 100000 -prp 57885161 -use NO_ASM
20241207 17:20:51  PRPLL 0.15-64-gc542395 starting
20241207 17:20:51  config: -iters 100000 -prp 57885161 -use NO_ASM 
20241207 17:20:52  device 0, OpenCL 25.0~git2412070600.2aae00~oibaf~o (git-2aae000 2024-12-07 oracular-oibaf-ppa), unique id ''
20241207 17:20:52 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241207 17:20:52 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241207 17:20:52 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241207 17:20:52 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241207 17:20:52 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241207 17:20:52 57885161 config:  -DNO_ASM=1
20241207 17:20:52 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241207 17:21:15 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241207 17:21:15 57885161 Proof of power 9 requires about 3.6GB of disk space
20241207 17:21:54 57885161 GPU read failed: fe829510fd920153 (gpu) != 10cbf1c7f8360e50 (host)
20241207 17:21:54 57885161 GPU read failed: fe829510fd920153 (gpu) != 10cbf1c7f8360e50 (host)
20241207 17:21:54 57885161 GPU read failed: fe829510fd920153 (gpu) != 10cbf1c7f8360e50 (host)
20241207 17:21:54  Exception "GPU persistent read errors"
20241207 17:21:54  Bye

preda · 2024-12-09T14:00:11Z

@oscarbg that's impressive, it almost runs! :)

The problem we hit now manifests in a consistent mismatch between the checksum computed GPU-side and CPU-side. Given that this happens on an iGPU, and with the consistent values (that don't change across read attemts), it seems this is not a data transfer error.

More likely, the problem is with the kernel that computes the checksum GPU-side, sum64() in etc.cl:

gpuowl/src/cl/etc.cl

Line 19 in 7a0ba6d

KERNEL(256) sum64(global ulong* out, u32 sizeBytes, global ulong* in) {

The 'tricky' elements in this kernel are: work_group_reduce_add() and global atomic_add(). Maybe there's an error around there. The small kernel sum64() could be re-written with various experiments to try to isolate the error.

At this point, I think it's worth openinig an issues with rusticl, letting them know of this unexpected behavior (so they may become aware of a potential error there). The argument that it's unlikey "on-our-side" is that the code runs correctly across multiple GPUs both AMD and Nvidia.

chocolate42 · 2024-12-23T15:47:34Z

Just did a quick test with an intel A310 GPU using the iris driver in mesa. gpuowl failed with similar issues to my last radeonsi attempt:

celery@fedora:~/Documents/gpu_a310/gpuowl_run$ OverrideDefaultFP64Settings=1 IGC_EnableDPEmulation=1 RUSTICL_ENABLE=iris RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/git/mesa-main-2024-12-23/mesabuild ../../../gpu_a310/gpuowl/build-release/gpuowl -iters 10000 -prp 5117313
20241223 15:34:14  GpuOwl VERSION v7.5-10-g61a08e2
20241223 15:34:14  GpuOwl VERSION v7.5-10-g61a08e2
20241223 15:34:14  config: -iters 10000 -prp 5117313 
WARNING: OpenCL support via iris driver is incomplete.
For a complete and conformant OpenCL implementation, use
https://github.com/intel/compute-runtime instead
20241223 15:34:14  device 0, unique id '', driver '25.0.0-devel'
20241223 15:34:14 5117313 FFT: 256K 256:2:256 (19.52 bpw)
20241223 15:34:14 5117313 OpenCL args "-DEXP=5117313u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=2u -DWEIGHT_STEP=0.39377732713590624 -DIWEIGHT_STEP=-0.28252527822725104 -DIWEIGHTS={0,-0.48523002361711648,-0.47002374282953113,-0.45436826922573748,-0.43825033367116206,-0.42165627506162284,-0.40457202874456544,-0.3869831145982634,} -DFWEIGHTS={0,0.94261523763811095,0.88687698075188715,0.83273798717860426,0.78015237024567874,0.72907555987842787,0.67946426482374089,0.63127643595764338,}  -cl-std=CL2.0 -cl-finite-math-only "
SPIR-V parsing FAILED:
    In file ../mesa-main/src/compiler/spirv/vtn_variables.c:2363
    Initializer for CrossWorkgroup variable 3 not yet supported in Mesa.
    163372 bytes into the SPIR-V binary

PRPLL failed because it looks like rusticl checks for and intentionally fails when fp64 is emulated:

celery@fedora:~/Documents/gpu_a310/prpll_run$ OverrideDefaultFP64Settings=1 IGC_EnableDPEmulation=1 RUSTICL_ENABLE=iris RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/git/mesa-main-2024-12-23/mesabuild ../../../gpu_a310/prpll/build-release/prpll -iters 10000 -prp 5117313
20241223 15:36:36  PRPLL 0.15-90-gb947294 starting
20241223 15:36:36  config: -iters 10000 -prp 5117313 
WARNING: OpenCL support via iris driver is incomplete.
For a complete and conformant OpenCL implementation, use
https://github.com/intel/compute-runtime instead
20241223 15:36:36  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 15:36:36 5117313 No FFTs found in tune.txt that can handle 5117313. Consider tuning with -tune
20241223 15:36:36 5117313 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241223 15:36:36 5117313 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241223 15:36:36 5117313 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241223 15:36:36 5117313 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241223 15:36:36 5117313 config: 
20241223 15:36:36 5117313 FFT: 256K 256:2:256:0 (19.52 bpw)
prpll: ../mesa-main/src/compiler/nir/nir_lower_double_ops.c:681: lower_doubles_instr_to_soft: Assertion `softfp64 != NULL' failed.

I'll try with an intel B580 tomorrow which has native fp64. I expect the same results as @oscarbg as the iris and radeonsi drivers are in similar states, iris has a few more extensions implemented but they look irrelevant to us.

chocolate42 · 2024-12-23T17:11:26Z

I also have a zen4 igpu and tried to repoduce @oscarbg 's results. Got the same ASM result so using -use NO_ASM, however then it failed because the latest commit (b947294) uses __builtin_amdgcn_s_sleep, which presumably doesn't exist on rdna2:

f40@fedora:~/Documents/gpu/prpll_run$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll/build-release/prpll -prp 57885161 -use NO_ASM
20241223 16:58:46  PRPLL 0.15-90-gb947294 starting
20241223 16:58:46  config: -prp 57885161 -use NO_ASM 
20241223 16:58:46  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 16:58:46 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241223 16:58:46 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241223 16:58:46 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241223 16:58:46 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241223 16:58:46 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241223 16:58:46 57885161 config:  -DNO_ASM=1
20241223 16:58:46 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241223 16:58:47 57885161 In file included from input.cl:1:
/tmp/openclon12/carryfused.cl:11:3: error: use of undeclared identifier '__builtin_amdgcn_s_sleep'
Error executing LLVM compilation action.

20241223 16:58:47 57885161 Compiling 'carryfused.cl' error COMPILE_PROGRAM_FAILURE (-15) (args -cl-finite-math-only -cl-std=CL2.0  -DNO_ASM=1 -DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DCARRY_LEN=8u -DNW=4u -DNH=4u -DAMDGPU=1 -DWEIGHT_STEP=0.51445938099070077 -DIWEIGHT_STEP=-0.33969836857173508 -DFFT_VARIANT=0u -DTAILT=U2(-7.5298160855459062e-05,0.012271538285719925) -DTRIG_SCALE=3 -DTRIG_SIN={1.331580545039619e-06,7.4418738109161865e-23,-3.9350574163656679e-19,3.4886384292170233e-32,-1.472792461108135e-45,3.6269691064102593e-59,-5.8460626541768332e-73,6.5788131861971713e-87,} -DTRIG_COS={1,-8.8655337396400457e-13,1.3099614748116047e-25,-7.7423384350554752e-39,2.451427236886687e-52,-4.8296023716962189e-66,6.4872905043230845e-80,-6.2756334651618034e-94,} -DFRAC_BPW_HI=1723128490u -DFRAC_BPW_LO=2863311529u -DFRAC_BITS_BIGSTEP=1073741823u )
20241223 16:58:47 57885161 Can't compile carryfused.cl
20241223 16:58:47  Exception "Can't compile carryfused.cl"
20241223 16:58:47  Bye

edit:
Using the previous commit I can confirm the checksum errors, although they're different every run:

f40@fedora:~/Documents/gpu/prpll2$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll2/build-release/prpll -prp 57885161 -use NO_ASM
20241223 17:22:01  PRPLL 0.15-89-g2c38c27 starting
20241223 17:22:01  config: -prp 57885161 -use NO_ASM 
20241223 17:22:02  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 17:22:02 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241223 17:22:02 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241223 17:22:02 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241223 17:22:02 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241223 17:22:02 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241223 17:22:02 57885161 config:  -DNO_ASM=1
20241223 17:22:02 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241223 17:22:09 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:22:09 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:22:20 57885161 GPU read failed: 00b5708d015ae9e3 (gpu) != f2a5059c02275608 (host)
20241223 17:22:20 57885161 GPU read failed: 00b5708d015ae9e3 (gpu) != f2a5059c02275608 (host)
20241223 17:22:20 57885161 GPU read failed: 00b5708d015ae9e3 (gpu) != f2a5059c02275608 (host)
20241223 17:22:20  Exception "GPU persistent read errors"
20241223 17:22:20  Bye
f40@fedora:~/Documents/gpu/prpll2$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll2/build-release/prpll -prp 57885161 -use NO_ASM
20241223 17:22:51  PRPLL 0.15-89-g2c38c27 starting
20241223 17:22:51  config: -prp 57885161 -use NO_ASM 
20241223 17:22:51  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 17:22:51 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241223 17:22:51 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241223 17:22:51 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241223 17:22:51 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241223 17:22:51 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241223 17:22:51 57885161 config:  -DNO_ASM=1
20241223 17:22:51 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241223 17:22:58 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:22:58 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:23:09 57885161 GPU read failed: 034d25e2035184c6 (gpu) != 0c03023cf9be1d0c (host)
20241223 17:23:09 57885161 GPU read failed: 034d25e2035184c6 (gpu) != 0c03023cf9be1d0c (host)
20241223 17:23:09 57885161 GPU read failed: 034d25e2035184c6 (gpu) != 0c03023cf9be1d0c (host)
20241223 17:23:09  Exception "GPU persistent read errors"
20241223 17:23:09  Bye

preda · 2024-12-23T17:42:56Z

Attempted to fix the __builtin_s_sleep(). (s_sleep is present on RDNA2, it's the compiler that does not have the builtin).

preda · 2024-12-23T17:46:57Z

The GPU read check should be disabled in source (i.e. always have it pass) to see what fails then.

This line:

gpuowl/src/Gpu.cpp

Line 674 in 002f686

if (hostSum == gpuSum) {

should be changed to
if (true || hostSum == gpuSum) {

chocolate42 · 2024-12-23T17:55:01Z

The sleep fix worked. Modifying that line in Gpu.cpp did this:

f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -prp 57885161 -use NO_ASM
20241223 17:52:32  PRPLL 0.15-92-g002f686-dirty starting
20241223 17:52:32  config: -prp 57885161 -use NO_ASM 
20241223 17:52:32  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 17:52:32 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241223 17:52:32 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241223 17:52:32 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241223 17:52:32 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241223 17:52:32 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241223 17:52:32 57885161 config:  -DNO_ASM=1
20241223 17:52:32 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241223 17:52:39 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:52:39 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:52:56 57885161 EE      2000 0231a69f87ccd053 5499 ETA 3d 16:25; Z=37 (avg 37.3) 1 errors
20241223 17:53:02 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:53:02 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:53:18 57885161 EE      2000 75b4c4dba744745a 5396 ETA 3d 14:46; Z=7 (avg 22.3) 2 errors
20241223 17:53:18 57885161 Danger ROE! Z=6.9 is too small, increase precision or FFT size!
20241223 17:53:25 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:53:25 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:53:41 57885161 EE      2000 e15882fa014c45d1 5396 ETA 3d 14:46; Z=7 (avg 17.2) 3 errors
20241223 17:53:41 57885161 Danger ROE! Z=6.9 is too small, increase precision or FFT size!
20241223 17:53:41 57885161 3 sequential errors, will stop.
20241223 17:53:41  Exception "too many errors"
20241223 17:53:41  Bye

preda · 2024-12-23T18:00:24Z

Well something's broken (the read check failure was probably not spurious). If it's not a problem with the check, then maybe the read is not working correctly.

chocolate42 · 2024-12-23T18:02:46Z

Tuned with existing fft as suggested:

f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -use NO_ASM -tune -fft 1K:6:256
20241223 17:56:25  PRPLL 0.15-92-g002f686-dirty starting
20241223 17:56:25  config: -use NO_ASM -tune -fft 1K:6:256 
20241223 17:56:25  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 17:56:34  Error aa46d511be8939fc
20241223 17:56:34  * 100000.0 1K:6:256:0:0  58243153
20241223 17:56:43  Error 3db9b130cd7787ff
20241223 17:56:43  * 100000.0 1K:6:256:1:0  58409877
20241223 17:56:43  Bye

Same issue:

f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -prp 57885161 -use NO_ASM
20241223 17:57:24  PRPLL 0.15-92-g002f686-dirty starting
20241223 17:57:24  config: -prp 57885161 -use NO_ASM 
20241223 17:57:24  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 17:57:24 57885161 config:  -DNO_ASM=1
20241223 17:57:24 57885161 FFT: 3M 1K:6:256:0:0 (18.40 bpw)
20241223 17:57:31 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:57:31 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:57:47 57885161 EE      2000 f91e5665023487a3 5474 ETA 3d 16:01; Z=38 (avg 37.7) 1 errors
20241223 17:57:53 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:57:53 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:58:10 57885161 EE      2000 9e0adffd483d73e5 5398 ETA 3d 14:47; Z=7 (avg 22.5) 2 errors
20241223 17:58:10 57885161 Danger ROE! Z=6.9 is too small, increase precision or FFT size!
20241223 17:58:16 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:58:16 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:58:32 57885161 EE      2000 30fa528ffd0fa058 5420 ETA 3d 15:09; Z=7 (avg 17.4) 3 errors
20241223 17:58:32 57885161 Danger ROE! Z=6.9 is too small, increase precision or FFT size!
20241223 17:58:32 57885161 3 sequential errors, will stop.
20241223 17:58:32  Exception "too many errors"
20241223 17:58:32  Bye

Go up an FFT size:

f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -prp 57885161 -use NO_ASM -fft 256:13:512
20241223 17:59:30  PRPLL 0.15-92-g002f686-dirty starting
20241223 17:59:30  config: -prp 57885161 -use NO_ASM -fft 256:13:512 
20241223 17:59:30  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 17:59:30 57885161 config:  -DNO_ASM=1
20241223 17:59:30 57885161 FFT: 3.25M 256:13:512:3 (16.99 bpw)
20241223 17:59:39 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 17:59:39 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 17:59:58 57885161 EE      2000 7c3aba03b874b0ab 6542 ETA 4d 09:11; Z=415 (avg 414.9) 1 errors
20241223 18:00:06 57885161 EE         0 on-load: ba93dd3b845c2527 vs. 0000000000000003
20241223 18:00:06  Exception "Error on load"
20241223 18:00:06  Bye
f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -prp 57885161 -use NO_ASM -fft 256:13:512
20241223 18:00:24  PRPLL 0.15-92-g002f686-dirty starting
20241223 18:00:24  config: -prp 57885161 -use NO_ASM -fft 256:13:512 
20241223 18:00:24  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 18:00:24 57885161 config:  -DNO_ASM=1
20241223 18:00:24 57885161 FFT: 3.25M 256:13:512:3 (16.99 bpw)
20241223 18:00:32 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 18:00:32 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 18:00:51 57885161 EE      2000 78bd224a319adf72 6380 ETA 4d 06:35; Z=418 (avg 417.6) 1 errors
20241223 18:00:59 57885161 EE         0 on-load: b04da776be912cd4 vs. 0000000000000003
20241223 18:00:59  Exception "Error on load"
20241223 18:00:59  Bye

The tune timings being 100000.0 looks suspicious.

preda · 2024-12-23T18:10:03Z

Tune timing "100000.0" is a huge value that indicates that it failed (so that config, which did not pass, should have basically "infinite" cost as it's broken).

preda · 2024-12-23T18:12:52Z

It is possible to run with "-use DEBUG" which enables asserts (in OpenCL) at runtime -- which is normally rather slow. But maybe we'd see some earlier failure.

chocolate42 · 2024-12-23T18:15:05Z

FWIW bigger FFT's take longer to tune as expected.

f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -use NO_ASM -tune
20241223 18:04:33  PRPLL 0.15-92-g002f686-dirty starting
20241223 18:04:33  config: -use NO_ASM -tune 
20241223 18:04:33  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 18:04:33  BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241223 18:04:33  BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241223 18:04:33  BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241223 18:04:33  BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241223 18:04:35  Error 4c126f095ec8cfc1
20241223 18:04:35    100000.0 256:2:256:0:0   5117313
20241223 18:04:37  Error 770b9e94120731d0
20241223 18:04:37    100000.0 256:3:256:0:0   7613448
20241223 18:04:40  Error 970576b507e74f18
20241223 18:04:40    100000.0 256:4:256:0:0  10086252
20241223 18:04:43  Error 4f0e244055062ae7
20241223 18:04:43    100000.0 256:2:512:0:0  10040115
20241223 18:04:45  Error 56d0223a73324940

...

20241223 18:05:47    100000.0 1K:3:256:0:0  29617029
20241223 18:05:52  Error a6eb3b8c0933a327
20241223 18:05:52    100000.0 256:12:256:0:0  29458169
20241223 18:05:57  Error 3ec6aef2e65a4096
20241223 18:05:57    100000.0 256:6:512:0:0  29477044
20241223 18:06:02  Error 4deaa398a237e398
20241223 18:06:02    100000.0 512:6:256:0:0  29500637
20241223 18:06:07  Error 4912c1b11ed6a926
20241223 18:06:07    100000.0 512:3:512:0:0  29437722
20241223 18:06:13  Error 839f6716fdbd62c0
20241223 18:06:13    100000.0 256:13:256:0:0  31815892

f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -use NO_ASM -tune -fft 4M
20241223 18:06:26  PRPLL 0.15-92-g002f686-dirty starting
20241223 18:06:26  config: -use NO_ASM -tune -fft 4M 
20241223 18:06:26  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 18:06:37  Error 3c0a0b68939c4563
20241223 18:06:37  * 100000.0 1K:8:256:0:0  77007421
20241223 18:06:50  Error 3903e204419f9a7b
20241223 18:06:50  * 100000.0 1K:8:256:1:0  77334577

f40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -use NO_ASM -tune -fft 18M
20241223 18:07:03  PRPLL 0.15-92-g002f686-dirty starting
20241223 18:07:03  config: -use NO_ASM -tune -fft 18M 
20241223 18:07:03  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 18:07:56  Error 79c9ad6e6280ad40
20241223 18:07:56  * 100000.0  1K:9:1K:0:0 332476849

Tune timing "100000.0" is a huge value that indicates that it failed (so that config, which did not pass, should have basically "infinite" cost as it's broken).

Funnily testing the currently erroneous B580 gives numbers higher than 100000.0 at the higher FFT's: https://www.mersenneforum.org/node/1062411?p=1064299#post1064299

chocolate42 · 2024-12-23T18:25:25Z

It is possible to run with "-use DEBUG" which enables asserts (in OpenCL) at runtime -- which is normally rather slow. But maybe we'd see some earlier failure.

40@fedora:~/Documents/gpu/prpll4$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll4/build-release/prpll -prp 57885161 -use NO_ASM,DEBUG -fft 256:13:512
20241223 18:24:11  PRPLL 0.15-92-g002f686-dirty starting
20241223 18:24:11  config: -prp 57885161 -use NO_ASM,DEBUG -fft 256:13:512 
20241223 18:24:11  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 18:24:11 57885161 config:  -DDEBUG=1 -DNO_ASM=1
20241223 18:24:11 57885161 FFT: 3.25M 256:13:512:3 (16.99 bpw)
20241223 18:24:11 57885161 In file included from input.cl:1:
In file included from /tmp/openclon12/fftp.cl:7:
In file included from /tmp/openclon12/fftwidth.cl:4:
In file included from /tmp/openclon12/fftbase.cl:6:
/tmp/openclon12/trig.cl:41:15: error: use of undeclared identifier 'n'
Error executing LLVM compilation action.

20241223 18:24:11 57885161 Compiling 'fftp.cl' error COMPILE_PROGRAM_FAILURE (-15) (args -cl-finite-math-only -cl-std=CL2.0  -DDEBUG=1 -DNO_ASM=1 -DEXP=57885161u -DWIDTH=256u -DSMALL_HEIGHT=512u -DMIDDLE=13u -DCARRY_LEN=8u -DNW=4u -DNH=8u -DAMDGPU=1 -DWEIGHT_STEP=0.0099470002848833783 -DIWEIGHT_STEP=-0.0098490319611598265 -DFFT_VARIANT=3u -DTAILT=U2(-1.8824717398857355e-05,0.0061358846491544753) -DTRIG_SCALE=11 -DTRIG_SIN={3.3522307427570831e-07,-1.2563750003736832e-24,-6.2784214256363043e-21,3.5276726415399086e-35,-9.4385829203836991e-50,1.4731332310453353e-64,-1.5048548786217372e-79,1.0732889266281846e-94,} -DTRIG_COS={1,-5.6187254763428522e-14,5.2616793297505444e-28,-1.97092877987088e-42,3.95503846604331e-57,-4.9382785316349927e-72,4.2032744514730063e-87,-2.5377621968447027e-102,} -DFRAC_BPW_HI=4233636942u -DFRAC_BPW_LO=3303820995u -DFRAC_BITS_BIGSTEP=1073741823u )
20241223 18:24:11 57885161 Can't compile fftp.cl
20241223 18:24:11  Exception "Can't compile fftp.cl"
20241223 18:24:11  Bye

Would it help if I compiled a debug build?

chocolate42 · 2024-12-23T18:51:57Z

Commenting out the offending assert in trig.cl with Gpu.cpp untouched goes back to erroring in the same place as before:

f40@fedora:~/Documents/gpu/prpll_debug_clean$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll_debug_clean/build-debug/prpll -prp 57885161 -use NO_ASM,DEBUG -fft 256:13:512
20241223 18:45:14  PRPLL 0.15-92-g002f686-dirty starting
20241223 18:45:14  config: -prp 57885161 -use NO_ASM,DEBUG -fft 256:13:512 
20241223 18:45:14  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 18:45:14 57885161 config:  -DDEBUG=1 -DNO_ASM=1
20241223 18:45:14 57885161 FFT: 3.25M 256:13:512:3 (16.99 bpw)
20241223 18:45:27 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 18:45:27 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 18:45:48 57885161 GPU read failed: ff27b0e9001ba8b3 (gpu) != 01a268f60455b72a (host)
20241223 18:45:48 57885161 GPU read failed: ff27b0e9001ba8b3 (gpu) != 01a268f60455b72a (host)
20241223 18:45:48 57885161 GPU read failed: ff27b0e9001ba8b3 (gpu) != 01a268f60455b72a (host)
20241223 18:45:48  Exception "GPU persistent read errors"
20241223 18:45:48  Bye

Doing the Gpu.cpp line change as well goes back to this:

f40@fedora:~/Documents/gpu/prpll_debug_dirty$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll_debug_dirty/build-debug/prpll -prp 57885161 -use NO_ASM,DEBUG -fft 256:13:512
20241223 18:47:42  PRPLL 0.15-92-g002f686-dirty starting
20241223 18:47:42  config: -prp 57885161 -use NO_ASM,DEBUG -fft 256:13:512 
20241223 18:47:43  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 18:47:43 57885161 config:  -DDEBUG=1 -DNO_ASM=1
20241223 18:47:43 57885161 FFT: 3.25M 256:13:512:3 (16.99 bpw)
20241223 18:47:55 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 18:47:55 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 18:48:24 57885161 EE      2000 69e18d84791da96c 9504 ETA 6d 08:48; Z=411 (avg 410.7) 1 errors
20241223 18:48:37 57885161 EE         0 on-load: 02b776e9980ca31d vs. 0000000000000003
20241223 18:48:37  Exception "Error on load"
20241223 18:48:37  Bye

So -use DEBUG might have helped to find that undeclared (?) variable n, but that doesn't seem to have been the blocker.

edit: Of course the assert wasn't the issue, without DEBUG the asserts are all removed.

preda · 2024-12-23T19:13:14Z

I opened a RustiCL issue to let them know of the situation:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/12361

preda · 2024-12-23T21:56:31Z

We have feedback from Karol Herbst that workgroup_reduce() family is not implemented by RustiCL.

So I just removed all use of workgrup_reduce() in the latest commit. Please retry.

I have a strong suspicion not everything is fixed by this though. Let's see how the read check behaves now.

chocolate42 · 2024-12-23T22:28:26Z

Clean

f40@fedora:~/Documents/gpu/prpll6$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll6/build-release/prpll -prp 57885161 -use DEBUG,NO_ASM
20241223 22:05:23  PRPLL 0.15-93-g3737df5 starting
20241223 22:05:23  config: -prp 57885161 -use DEBUG,NO_ASM 
20241223 22:05:23  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 22:05:23 57885161 config:  -DDEBUG=1 -DNO_ASM=1
20241223 22:05:23 57885161 FFT: 3M 1K:6:256:0:0 (18.40 bpw)
20241223 22:05:23 57885161 In file included from input.cl:1:
In file included from /tmp/openclon12/fftp.cl:7:
In file included from /tmp/openclon12/fftwidth.cl:4:
In file included from /tmp/openclon12/fftbase.cl:6:
/tmp/openclon12/trig.cl:41:15: error: use of undeclared identifier 'n'
Error executing LLVM compilation action.

20241223 22:05:23 57885161 Compiling 'fftp.cl' error COMPILE_PROGRAM_FAILURE (-15) (args -cl-finite-math-only -cl-std=CL2.0  -DDEBUG=1 -DNO_ASM=1 -DEXP=57885161u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DCARRY_LEN=8u -DNW=4u -DNH=4u -DAMDGPU=1 -DWEIGHT_STEP=0.51445938099070077 -DIWEIGHT_STEP=-0.33969836857173508 -DFFT_VARIANT=0u -DTAILT=U2(-7.5298160855459062e-05,0.012271538285719925) -DTRIG_SCALE=3 -DTRIG_SIN={1.331580545039619e-06,7.4418738109161865e-23,-3.9350574163656679e-19,3.4886384292170233e-32,-1.472792461108135e-45,3.6269691064102593e-59,-5.8460626541768332e-73,6.5788131861971713e-87,} -DTRIG_COS={1,-8.8655337396400457e-13,1.3099614748116047e-25,-7.7423384350554752e-39,2.451427236886687e-52,-4.8296023716962189e-66,6.4872905043230845e-80,-6.2756334651618034e-94,} -DFRAC_BPW_HI=1723128490u -DFRAC_BPW_LO=2863311529u -DFRAC_BITS_BIGSTEP=1073741823u )
20241223 22:05:23 57885161 Can't compile fftp.cl
20241223 22:05:23  Exception "Can't compile fftp.cl"
20241223 22:05:23  Bye

f40@fedora:~/Documents/gpu/prpll6$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll6/build-release/prpll -prp 57885161 -use NO_ASM
20241223 22:05:54  PRPLL 0.15-93-g3737df5 starting
20241223 22:05:54  config: -prp 57885161 -use NO_ASM 
20241223 22:05:55  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 22:05:55 57885161 config:  -DNO_ASM=1
20241223 22:05:55 57885161 FFT: 3M 1K:6:256:0:0 (18.40 bpw)
20241223 22:06:01 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 22:06:01 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 22:06:18 57885161 EE      2000 470397a1d5627a8b 5694 ETA 3d 19:33; Z=38 (avg 38.0) 1 errors
20241223 22:06:25 57885161 EE         0 on-load: ff75baaa869a9607 vs. 0000000000000003
20241223 22:06:25  Exception "Error on load"
20241223 22:06:25  Bye

Remove assert in trig.cl

f40@fedora:~/Documents/gpu/prpll6$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/gpu/mesa/mesa-main-2024-12-23/mesabuild ../../../prpll6/build-release/prpll -prp 57885161 -use DEBUG,NO_ASM
20241223 22:12:49  PRPLL 0.15-93-g3737df5-dirty starting
20241223 22:12:49  config: -prp 57885161 -use DEBUG,NO_ASM 
20241223 22:12:49  device 0, OpenCL 25.0.0-devel, unique id ''
20241223 22:12:49 57885161 config:  -DDEBUG=1 -DNO_ASM=1
20241223 22:12:49 57885161 FFT: 3M 1K:6:256:0:0 (18.40 bpw)
20241223 22:12:58 57885161 OK         0 on-load: blockSize 1000, 0000000000000003
20241223 22:12:58 57885161 Proof of power 9 requires about 3.6GB of disk space
20241223 22:13:17 57885161 EE      2000 6823b16f1c4564c4 6477 ETA 4d 08:09; Z=36 (avg 36.4) 1 errors
20241223 22:13:25 57885161 EE         0 on-load: 129abe57c2e7ad31 vs. 0000000000000003
20241223 22:13:25  Exception "Error on load"
20241223 22:13:25  Bye

This is all with Gpu.cpp untouched. So that's progress? The checksums at least aren't mismatching.

edit: 20241223 22:13:25 57885161 EE 0 on-load: 129abe57c2e7ad31 vs. 0000000000000003
At this line it tried to reload the previous good checkpoint after errors were detected? So it read garbage instead of what we know should be all zeroes and a 3? It's different every time.

chocolate42 · 2024-12-24T09:42:52Z

On the B580 the LHS is zeroed instead of garbage:

celery@fedora:~/Documents/gpu_b580/prpll$ RUSTICL_ENABLE=iris RUSTICL_FEATURES=fp64 meson devenv -C /home/celery/Documents/git/mesa-main-2024-12-23/mesabuild ../../../gpu_b580/prpll/build-debug/prpll -device 1 -prp 57885161
20241224 09:37:55  PRPLL 0.15-94-g6ddbb31 starting
20241224 09:37:55  config: -device 1 -prp 57885161 
WARNING: OpenCL support via iris driver is incomplete.
For a complete and conformant OpenCL implementation, use
https://github.com/intel/compute-runtime instead
MESA: warning: INTEL_HWCONFIG_TOTAL_GS_THREADS (336) != devinfo->max_gs_threads (312)
20241224 09:37:55  device 1, OpenCL 25.0.0-devel, unique id ''
20241224 09:37:55 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241224 09:37:55 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241224 09:37:55 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241224 09:37:55 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241224 09:37:55 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241224 09:37:55 57885161 config: 
20241224 09:37:55 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241224 09:38:02 57885161 EE         0 on-load: 0000000000000000 vs. 0000000000000003
20241224 09:38:03  Exception "Error on load"
20241224 09:38:03  Bye

celery@fedora:~/Documents/gpu_b580/prpll$ RUSTICL_ENABLE=iris RUSTICL_FEATURES=fp64 meson devenv -C /home/celery/Documents/git/mesa-main-2024-12-23/mesabuild ../../../gpu_b580/prpll/build-debug/prpll -device 1 -prp 57885161 -use NO_ASM
20241224 09:38:14  PRPLL 0.15-94-g6ddbb31 starting
20241224 09:38:14  config: -device 1 -prp 57885161 -use NO_ASM 
WARNING: OpenCL support via iris driver is incomplete.
For a complete and conformant OpenCL implementation, use
https://github.com/intel/compute-runtime instead
MESA: warning: INTEL_HWCONFIG_TOTAL_GS_THREADS (336) != devinfo->max_gs_threads (312)
20241224 09:38:14  device 1, OpenCL 25.0.0-devel, unique id ''
20241224 09:38:14 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241224 09:38:14 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241224 09:38:14 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241224 09:38:14 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241224 09:38:14 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241224 09:38:14 57885161 config:  -DNO_ASM=1
20241224 09:38:15 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241224 09:38:22 57885161 EE         0 on-load: 0000000000000000 vs. 0000000000000003
20241224 09:38:22  Exception "Error on load"
20241224 09:38:22  Bye

celery@fedora:~/Documents/gpu_b580/prpll$ RUSTICL_ENABLE=iris RUSTICL_FEATURES=fp64 meson devenv -C /home/celery/Documents/git/mesa-main-2024-12-23/mesabuild ../../../gpu_b580/prpll/build-debug/prpll -device 1 -prp 57885161 -use NO_ASM,DEBUG
20241224 09:38:27  PRPLL 0.15-94-g6ddbb31 starting
20241224 09:38:27  config: -device 1 -prp 57885161 -use NO_ASM,DEBUG 
WARNING: OpenCL support via iris driver is incomplete.
For a complete and conformant OpenCL implementation, use
https://github.com/intel/compute-runtime instead
MESA: warning: INTEL_HWCONFIG_TOTAL_GS_THREADS (336) != devinfo->max_gs_threads (312)
20241224 09:38:27  device 1, OpenCL 25.0.0-devel, unique id ''
20241224 09:38:27 57885161 No FFTs found in tune.txt that can handle 57885161. Consider tuning with -tune
20241224 09:38:27 57885161 BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241224 09:38:27 57885161 BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241224 09:38:27 57885161 BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241224 09:38:27 57885161 BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
20241224 09:38:27 57885161 config:  -DDEBUG=1 -DNO_ASM=1
20241224 09:38:27 57885161 FFT: 3M 1K:6:256:0 (18.40 bpw)
20241224 09:38:47 57885161 EE         0 on-load: 0000000000000000 vs. 0000000000000003
20241224 09:38:47  Exception "Error on load"
20241224 09:38:47  Bye

It runs with ASM enabled, but should it be disabled because you've made ASM specific to AMD?

edit: BTW a gotcha between rusticl/other is that they may enumerate the GPU's differently. Unless it's random the b580 is device 1 on rusticl, device 0 on intel's runtime.

edit edit: No change with mesa build from 2024-12-24

chocolate42 · 2024-12-26T14:31:13Z

This is the current status with B580:

celery@fedora:~/Documents/gpu_b580/prpll$ RUSTICL_ENABLE=iris RUSTICL_FEATURES=fp64 meson devenv -C ~/Documents/git/mesa-main-2024-12-24/build --workdir ./ ./build-debug/prpll -tune
20241226 14:21:43  PRPLL 0.15-94-g6ddbb31 starting
20241226 14:21:43  config: -tune 
MESA: warning: INTEL_HWCONFIG_TOTAL_GS_THREADS (336) != devinfo->max_gs_threads (312)
WARNING: OpenCL support via iris driver is incomplete.
For a complete and conformant OpenCL implementation, use
https://github.com/intel/compute-runtime instead
20241226 14:21:43  device 0, OpenCL 25.0.0-devel, unique id ''
20241226 14:21:43  BPW info for 512:16:512 not found, defaults={18.02, 18.12, 18.12, 18.22}
20241226 14:21:43  BPW info for 1K:16:512 not found, defaults={17.74, 17.84, 17.84, 17.94}
20241226 14:21:43  BPW info for 4K:16:512 not found, defaults={17.19, 17.29, 17.29, 17.39}
20241226 14:21:43  BPW info for 4K:16:1K not found, defaults={16.92, 17.02, 17.02, 17.12}
prpll: src/Gpu.cpp:1338: double Gpu::timePRP(): Assertion `dataResidue() == state.res64' failed.

Removing that assert and recompiling we don't trip any more asserts and are back at zeroed data.

chocolate42 · 2024-12-27T14:01:07Z

RDNA3 780M:

Not sure why it's doing different things, but here's something new on the latest mesa:

f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp16,fp64 meson devenv -C /home/f40/Documents/gpu/builds/mesa/mesa-main-2024-12-27/build --workdir ./ ./build-release/prpll -prp 77936867 -use NO_ASM
20241227 13:50:14  PRPLL 0.15-94-g6ddbb31 starting
20241227 13:50:14  config: -prp 77936867 -use NO_ASM 
20241227 13:50:14  device 0, OpenCL 25.0.0-devel, unique id ''
20241227 13:50:14 77936867 config:  -DNO_ASM=1
20241227 13:50:14 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
Unknown intrinsic: div 32x2  %39 = @load_global (%38) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 32x2  %41 = @load_global (%40) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 64x2  %23 = @load_global_constant (%22) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %39 = @load_global (%38) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %48 = @load_global (%47) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %48 = @load_global (%47) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %34 = @load_global (%33) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %43 = @load_global (%42) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32    %28 = @load_global (%27) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %55 = @load_global (%54) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32x2  %30 = @load_global (%29) (access=none, align_mul=8, align_offset=0)
20241227 13:50:15 77936867 EE         0 on-load: 0000000000000000 vs. 0000000000000003
20241227 13:50:15  Exception "Error on load"
20241227 13:50:15  Bye
f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp16,fp64 meson devenv -C /home/f40/Documents/gpu/builds/mesa/mesa-main-2024-12-27/build --workdir ./ ./build-release/prpll -prp 77936867 -use NO_ASM,DEBUG
20241227 13:51:47  PRPLL 0.15-94-g6ddbb31 starting
20241227 13:51:47  config: -prp 77936867 -use NO_ASM,DEBUG 
20241227 13:51:47  device 0, OpenCL 25.0.0-devel, unique id ''
20241227 13:51:47 77936867 config:  -DDEBUG=1 -DNO_ASM=1
20241227 13:51:47 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
Unknown intrinsic: div 32x2  %39 = @load_global (%38) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 32x2  %41 = @load_global (%40) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 64x2  %23 = @load_global_constant (%22) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %39 = @load_global (%38) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %19 = @global_atomic (%18, %7 (0x10)) (atomic_op=iadd)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %19 = @global_atomic (%18, %7 (0x10)) (atomic_op=iadd)
Unknown intrinsic: div 64x2  %34 = @load_global (%33) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %43 = @load_global (%42) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32    %28 = @load_global (%27) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %55 = @load_global (%54) (access=non-temporal, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32x2  %30 = @load_global (%29) (access=none, align_mul=8, align_offset=0)
20241227 13:51:48 77936867 EE         0 on-load: 0000000000000000 vs. 0000000000000003
20241227 13:51:48  Exception "Error on load"
20241227 13:51:48  Bye

Unknown intrinsics and now the data is zeroed where previously it was garbage.

preda · 2024-12-28T06:50:08Z

It may be that the "unknown intrinsic" is caused by the __builtin_nontemporal_load/store which is recent.

Since the latest commit ( 1582454 )
this is disabled by default (should be back to the previous behavior), and it can be controlled with -use NONTEMPORAL

Although, looking more carefully, I see that not all of the load_global() that produce the "unknown intrinsic" above have access=non-temporal, so it's not that.

chocolate42 · 2024-12-28T10:24:30Z

f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp16,fp64 meson devenv -C /home/f40/Documents/gpu/builds/mesa/mesa-main-2024-12-28/build --workdir ./ ./build-release/prpll -prp 77936867 -use NO_ASM
20241228 10:20:53  PRPLL 0.15-95-g1582454 starting
20241228 10:20:53  config: -prp 77936867 -use NO_ASM 
20241228 10:20:53  device 0, OpenCL 25.0.0-devel, unique id ''
20241228 10:20:53 77936867 config:  -DNO_ASM=1
20241228 10:20:53 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
Unknown intrinsic: div 32x2  %39 = @load_global (%38) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 32x2  %41 = @load_global (%40) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 64x2  %23 = @load_global_constant (%22) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %39 = @load_global (%38) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %48 = @load_global (%47) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %48 = @load_global (%47) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %34 = @load_global (%33) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %43 = @load_global (%42) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32    %28 = @load_global (%27) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %55 = @load_global (%54) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32x2  %30 = @load_global (%29) (access=none, align_mul=8, align_offset=0)
20241228 10:20:54 77936867 EE         0 on-load: 0000000000000000 vs. 0000000000000003
20241228 10:20:54  Exception "Error on load"
20241228 10:20:54  Bye

Will install ROCm later to compare.

preda · 2024-12-28T10:53:23Z

@chocolate42 could you please try with "-carry long" on RustiCL and see whether it fixes the

EE         0 on-load: 0000000000000000 vs. 0000000000000003

chocolate42 · 2024-12-28T13:18:25Z

780M, no change:

f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp16,fp64 meson devenv -C /home/f40/Documents/gpu/builds/mesa/mesa-main-2024-12-28/build --workdir ./ ./build-release/prpll -prp 77936867 -use NO_ASM -carry long
20241228 13:18:03  PRPLL 0.15-95-g1582454 starting
20241228 13:18:03  config: -prp 77936867 -use NO_ASM -carry long 
20241228 13:18:03  device 0, OpenCL 25.0.0-devel, unique id ''
20241228 13:18:03 77936867 config:  -DNO_ASM=1
20241228 13:18:03 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 13:18:03 77936867 Using long carry!
Unknown intrinsic: div 32x2  %39 = @load_global (%38) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 32x2  %41 = @load_global (%40) (access=none, align_mul=8, align_offset=0)
Unknown intrinsic: div 64x2  %23 = @load_global_constant (%22) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %39 = @load_global (%38) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %48 = @load_global (%47) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %48 = @load_global (%47) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %34 = @load_global (%33) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 64x2  %43 = @load_global (%42) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32    %28 = @load_global (%27) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 64x2  %56 = @load_global (%55) (access=none, align_mul=16, align_offset=0)
Unknown intrinsic: div 32    %30 = @load_global (%29) (access=none, align_mul=4, align_offset=0)
Unknown intrinsic: div 32x2  %30 = @load_global (%29) (access=none, align_mul=8, align_offset=0)
20241228 13:18:03 77936867 EE         0 on-load: 0000000000000000 vs. 0000000000000003
20241228 13:18:03  Exception "Error on load"
20241228 13:18:03  Bye

preda · 2024-12-28T17:00:44Z

As long as we have those "Unknown intrinsic" for global read, it's broken.
Those will be fixed in Mesa as per https://gitlab.freedesktop.org/mesa/mesa/-/issues/12361#note_2717423
We need to check again with the fix, with "-carry long", and see what we hit then.

chocolate42 · 2024-12-28T18:57:09Z

I think that's done it (tested with the PR). The 780M needs -carry long:

f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ rm -r 7*
f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp16,fp64 meson devenv -C /home/f40/Documents/gpu/builds/mesa/mesa-main--fix-global-access/build --workdir ./ ./build-release/prpll -prp 77936867 -use NO_ASM -iters 10000
20241228 18:14:58  PRPLL 0.15-95-g1582454 starting
20241228 18:14:58  config: -prp 77936867 -use NO_ASM -iters 10000 
20241228 18:14:58  device 0, OpenCL 25.0.0-devel, unique id ''
20241228 18:14:58 77936867 config:  -DNO_ASM=1
20241228 18:14:58 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 18:15:10 77936867 OK         0 on-load: blockSize 1000, 0000000000000003
20241228 18:15:10 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 18:15:37 77936867 EE      2000 3f1730589b53d3f0 9255 ETA 8d 08:21; Z=732 (avg 732.1) 1 errors
20241228 18:15:48 77936867 EE         0 on-load: 931e2dfb51a198ef vs. 0000000000000003
20241228 18:15:48  Exception "Error on load"
20241228 18:15:48  Bye

Succeeds with -carry long

f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ rm -r 7*
f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp16,fp64 meson devenv -C /home/f40/Documents/gpu/builds/mesa/mesa-main--fix-global-access/build --workdir ./ ./build-release/prpll -prp 77936867 -use NO_ASM -iters 10000 -carry long
20241228 18:12:28  PRPLL 0.15-95-g1582454 starting
20241228 18:12:28  config: -prp 77936867 -use NO_ASM -iters 10000 -carry long 
20241228 18:12:28  device 0, OpenCL 25.0.0-devel, unique id ''
20241228 18:12:28 77936867 config:  -DNO_ASM=1
20241228 18:12:28 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 18:12:28 77936867 Using long carry!
20241228 18:12:40 77936867 OK         0 on-load: blockSize 1000, 0000000000000003
20241228 18:12:40 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 18:13:11 77936867 OK      2000 f54bf64335a7266e 10269 ETA 9d 06:18; Z=713 (avg 713.0)
20241228 18:14:29 77936867 Stopping, please wait..
20241228 18:14:43 77936867 OK     10000 fc4f135f7cf4ad29 10252 ETA 9d 05:56; Z=701 (avg 711.0)
20241228 18:14:43  Exception "stop requested"
20241228 18:14:43  Bye

Hash matches at iteration 2000 on a second run

f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ rm -r 7*
f40@fedora:~/Documents/gpu/builds/preda/gpuowl$ RUSTICL_ENABLE=radeonsi RUSTICL_FEATURES=fp16,fp64 meson devenv -C /home/f40/Documents/gpu/builds/mesa/mesa-main--fix-global-access/build --workdir ./ ./build-release/prpll -prp 77936867 -use NO_ASM -iters 100000 -carry long
20241228 18:16:31  PRPLL 0.15-95-g1582454 starting
20241228 18:16:31  config: -prp 77936867 -use NO_ASM -iters 100000 -carry long 
20241228 18:16:31  device 0, OpenCL 25.0.0-devel, unique id ''
20241228 18:16:31 77936867 config:  -DNO_ASM=1
20241228 18:16:31 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 18:16:31 77936867 Using long carry!
20241228 18:16:43 77936867 OK         0 on-load: blockSize 1000, 0000000000000003
20241228 18:16:43 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 18:17:14 77936867 OK      2000 f54bf64335a7266e 10272 ETA 9d 06:22; Z=713 (avg 713.0)
20241228 18:20:19 77936867        20000 3cd1bd9d5e09cbc5 10255
20241228 18:23:44 77936867        40000 dffe1b1b0d748128 10258
20241228 18:27:09 77936867        60000 0945da4dc08bdd95 10258
20241228 18:30:35 77936867        80000 8d76071d27ee4221 10267
20241228 18:33:56 77936867 Stopping, please wait..
20241228 18:34:10 77936867 OK    100000 6d7296b9e2830f50 10258 ETA 9d 05:48; Z=701 (avg 711.0)
20241228 18:34:10  Exception "stop requested"
20241228 18:34:10  Bye

Now I need to install rocm on this thing and see if there's any difference in timings. And test with B580. Tomorrow job.

edit: So when that PR gets merged, which may not be soon as mesa has a fancy way of merging PR's, prpll with AMD cards should work on rusticl? intel cards use a different driver so all bets are off there.

chocolate42 · 2024-12-28T22:11:31Z

Trying rocm to compare to the runs in previous post (Ubuntu 22.04 via distrobox, rocm 6.3.1, "make amd")

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 10000
20241228 21:52:06  PRPLL 0.15-95-g1582454 starting
20241228 21:52:06  config: -prp 77936867 -iters 10000 
20241228 21:52:06  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 21:52:06 77936867 config: 
20241228 21:52:06 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 21:52:18 77936867 OK     10000 on-load: blockSize 1000, fc4f135f7cf4ad29
20241228 21:52:18 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 21:52:45 77936867 OK     12000 e637ca5a70dab6ec 8927 ETA 8d 01:14; Z=731 (avg 731.1)
20241228 21:53:51 77936867 Stopping, please wait..
20241228 21:54:03 77936867 OK     20000 3cd1bd9d5e09cbc5 8636 ETA 7d 18:54; Z=654 (avg 718.3)
20241228 21:54:03  Exception "stop requested"
20241228 21:54:03  Bye

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 10000 -use NO_ASM
20241228 21:54:12  PRPLL 0.15-95-g1582454 starting
20241228 21:54:12  config: -prp 77936867 -iters 10000 -use NO_ASM 
20241228 21:54:12  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 21:54:12 77936867 config:  -DNO_ASM=1
20241228 21:54:12 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 21:54:24 77936867 OK     20000 on-load: blockSize 1000, 3cd1bd9d5e09cbc5
20241228 21:54:24 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 21:54:51 77936867 OK     22000 09bbce6a70f555f2 8901 ETA 8d 00:39; Z=712 (avg 712.3)
HW Exception by GPU node-1 (Agent handle: 0x55b0d6e2e030) reason :GPU Hang
Aborted (core dumped)

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 10000
20241228 21:56:15  PRPLL 0.15-95-g1582454 starting
20241228 21:56:15  config: -prp 77936867 -iters 10000 
20241228 21:56:15  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 21:56:16 77936867 config: 
20241228 21:56:16 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 21:56:33 77936867 OK     22000 on-load: blockSize 1000, 09bbce6a70f555f2
20241228 21:56:33 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 21:57:00 77936867 OK     24000 8c0a4153f473625c 8935 ETA 8d 01:22; Z=704 (avg 703.9)
20241228 21:58:05 77936867 Stopping, please wait..
20241228 21:58:17 77936867 OK     32000 a9c98e7681c11c56 8607 ETA 7d 18:16; Z=712 (avg 705.3)
20241228 21:58:17  Exception "stop requested"
20241228 21:58:17  Bye

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 10000
20241228 21:58:28  PRPLL 0.15-95-g1582454 starting
20241228 21:58:28  config: -prp 77936867 -iters 10000 
20241228 21:58:28  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 21:58:28 77936867 config: 
20241228 21:58:28 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 21:58:40 77936867 OK     32000 on-load: blockSize 1000, a9c98e7681c11c56
20241228 21:58:40 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 21:59:07 77936867 OK     34000 a135af9f5701e25f 8874 ETA 8d 00:01; Z=717 (avg 717.4)
20241228 21:59:59 77936867        40000 dffe1b1b0d748128 8620
HW Exception by GPU node-1 (Agent handle: 0x556bfd7a2fa0) reason :GPU Hang
Aborted (core dumped)

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 10000 -carry long
20241228 22:01:56  PRPLL 0.15-95-g1582454 starting
20241228 22:01:56  config: -prp 77936867 -iters 10000 -carry long 
20241228 22:01:56  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 22:01:57 77936867 config: 
20241228 22:01:57 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 22:01:57 77936867 Using long carry!
20241228 22:02:10 77936867 OK     40000 on-load: blockSize 1000, dffe1b1b0d748128
20241228 22:02:10 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 22:02:38 77936867 OK     42000 485a511150854576 9383 ETA 8d 11:01; Z=723 (avg 722.8)
20241228 22:03:50 77936867 Stopping, please wait..
20241228 22:04:03 77936867 OK     50000 52e286945371ed29 9412 ETA 8d 11:38; Z=763 (avg 729.4)
20241228 22:04:03  Exception "stop requested"
20241228 22:04:03  Bye

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 10000
20241228 22:04:19  PRPLL 0.15-95-g1582454 starting
20241228 22:04:19  config: -prp 77936867 -iters 10000 
20241228 22:04:19  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 22:04:20 77936867 config: 
20241228 22:04:20 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 22:04:32 77936867 OK     50000 on-load: blockSize 1000, 52e286945371ed29
20241228 22:04:32 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 22:04:59 77936867 OK     52000 b679553a3e73b42d 8933 ETA 8d 01:16; Z=728 (avg 727.8)
20241228 22:06:05 77936867 Stopping, please wait..
20241228 22:06:17 77936867 OK     60000 0945da4dc08bdd95 8645 ETA 7d 19:01; Z=733 (avg 728.6)
20241228 22:06:17  Exception "stop requested"
20241228 22:06:17  Bye

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 10000
20241228 22:06:51  PRPLL 0.15-95-g1582454 starting
20241228 22:06:51  config: -prp 77936867 -iters 10000 
20241228 22:06:51  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 22:06:52 77936867 config: 
20241228 22:06:52 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 22:07:04 77936867 OK     60000 on-load: blockSize 1000, 0945da4dc08bdd95
20241228 22:07:04 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 22:07:31 77936867 OK     62000 0e41a14eed9c1be1 8919 ETA 8d 00:56; Z=717 (avg 716.9)
20241228 22:08:36 77936867 Stopping, please wait..
20241228 22:08:49 77936867 OK     70000 7131fa4eb77f4bb2 8638 ETA 7d 18:51; Z=697 (avg 713.6)
20241228 22:08:49  Exception "stop requested"
20241228 22:08:49  Bye

📦[f40@u22.04 gpuowl_rocm]$ ./build-release/prpll-amd -prp 77936867 -iters 100000
20241228 22:08:52  PRPLL 0.15-95-g1582454 starting
20241228 22:08:52  config: -prp 77936867 -iters 100000 
20241228 22:08:52  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241228 22:08:52 77936867 config: 
20241228 22:08:52 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241228 22:09:04 77936867 OK     70000 on-load: blockSize 1000, 7131fa4eb77f4bb2
20241228 22:09:04 77936867 Proof of power 10 requires about 9.8GB of disk space
20241228 22:09:31 77936867 OK     72000 65b24e3cdc2e7ad2 8934 ETA 8d 01:15; Z=705 (avg 705.2)
HW Exception by GPU node-1 (Agent handle: 0x558e5dc63fa0) reason :GPU Hang
Aborted (core dumped)

Resumed from a rusticl save fine. Doesn't need -use NO_ASM or -carry long (why is that?). ~10% faster than rusticl. The GPU keeps hanging, not sure what the deal is there. Whenever the GPU hangs the screen goes blank and comes back 5-10 seconds later. dmesg has this to say

[44436.803321] amdgpu: sdma_engine_id 1 exceeds maximum id of 0
[44517.472904] amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[44517.472913] amdgpu 0000:03:00.0: amdgpu: failed to remove hardware queue from MES, doorbell=0x1002
[44517.472917] amdgpu 0000:03:00.0: amdgpu: MES might be in unrecoverable state, issue a GPU reset
[44517.472921] amdgpu 0000:03:00.0: amdgpu: Failed to evict queue 1
[44517.472924] amdgpu 0000:03:00.0: amdgpu: Failed to evict process queues
[44517.473034] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[44517.473169] amdgpu 0000:03:00.0: amdgpu: remove_all_queues_mes: Failed to remove queue 0 for dev 50085
[44517.473203] amdgpu 0000:03:00.0: amdgpu: Dumping IP State
[44517.475115] amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
[44520.475810] amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=SUSPEND
[44520.475819] [drm:amdgpu_mes_suspend [amdgpu]] *ERROR* failed to suspend all gangs
[44520.476189] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <mes_v11_0> failed -110
[44523.410736] amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[44523.410744] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[44526.343137] amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[44526.343145] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[44530.574802] amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[44530.574810] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[44530.576603] amdgpu 0000:03:00.0: amdgpu: MODE2 reset
[44530.616791] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
[44530.617337] [drm] PCIE GART of 512M enabled (table at 0x000000817FD00000).
[44530.617431] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[44530.620183] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
[44530.622673] [drm] DMUB hardware initialized: version=0x08004800
[44530.627902] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:225
[44530.630566] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:233
[44530.633256] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:241
[44530.636019] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:249
[44530.918354] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[44530.918366] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[44530.918369] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[44530.918372] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[44530.918374] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[44530.918376] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[44530.918379] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[44530.918382] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[44530.918384] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[44530.918387] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[44530.918389] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[44530.918392] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[44530.918394] amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[44530.920622] amdgpu 0000:03:00.0: amdgpu: GPU reset(2) succeeded!

So is that some rocm bug, or I've messed up installing rocm somehow?

edit: I'm running the rusticl build some more to try and get it to crash. It's well past the point that the rocm build hung 3 times and so far no problem.

edit edit: After looking more closely both builds average around 52W of total laptop draw from the wall in current conditions, but the rusticl build is very consistent at 49-53W, the rocm build stays there for a bit before crashing to ~30W then spiking to ~65W, at which point it often but not always hangs the GPU. Seems like something in the stack isn't configured properly for the rdna3 iGPU.

preda · 2024-12-29T08:07:38Z

[44436.803321] amdgpu: sdma_engine_id 1 exceeds maximum id of 0 ...

That looks like an AMDGPU driver bug to me. That is, the AMDGPU driver that's part of the Linux kernel (or that is installed with admgpu-dkms on older Linux kernels). Unfortunately, there's a multitude of such AMDGPU bugs. If possible, they should be reported "upstream". The workaround is to use a different version of the Linux kernel, e.g. from https://kernel.ubuntu.com/mainline/ . I'm using Linux kernel 6.8.x (which is quite a bit behind the most recent 6.12.x) and AMDGPU seems fine there.

chocolate42 · 2024-12-29T10:19:59Z

Thanks. That's tricky to test because the way distrobox works the container shares the host kernel which I don't want to mess with. Guess a live image (24.04 to match your 6.8 kernel) is the way (which will take hours to download, why is the Ubuntu image so massive compared to Fedora, 5.8GB vs 2.3GB my god).

In the meantime I managed to crash while running the rusticl build but it was a herculean effort. It ran for about 300k iterations while a PS2 emulator was taxing the iGPU via vulkan, crashed after ~30 minutes of sustained power draw from the wall of ~80W. This is a thin and light laptop I'm pretty sure it overheated (or soft-crashed because it was overheating), this might not be a bug bug but the rocm crashes surely are. This logged me out of the session and closed everything but didn't fully crash the PC.

[ 4133.736680] amdgpu 0000:03:00.0: amdgpu: Dumping IP State
[ 4133.739046] amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
[ 4133.749134] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=570395, emitted seq=570397
[ 4133.749140] amdgpu 0000:03:00.0: amdgpu: Process information: process Xwayland pid 9031 thread Xwayland:cs0 pid 9032
[ 4135.753018] amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=RESET
[ 4135.753029] [drm:amdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue
[ 4135.753562] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ 4137.847234] amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
[ 4137.847243] [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
[ 4138.078188] [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[ 4138.079912] amdgpu 0000:03:00.0: amdgpu: MODE2 reset
[ 4138.120568] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 4138.121194] [drm] PCIE GART of 512M enabled (table at 0x000000817FD00000).
[ 4138.121270] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
[ 4138.123458] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
[ 4138.125672] [drm] DMUB hardware initialized: version=0x08004800
[ 4138.130757] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:225
[ 4138.133378] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:233
[ 4138.136001] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:241
[ 4138.138650] amdgpu 0000:03:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:249
[ 4138.417024] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 4138.417035] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 4138.417039] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 4138.417041] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 4138.417043] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 4138.417046] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 4138.417048] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 4138.417051] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 4138.417056] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 4138.417059] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 4138.417061] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 4138.417064] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 4138.417067] amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[ 4138.419238] amdgpu 0000:03:00.0: amdgpu: GPU reset(2) succeeded!
[ 4138.422286] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 4140.193419] rfkill: input handler enabled
[ 4141.851645] rfkill: input handler disabled
[ 4157.317890] rfkill: input handler enabled
[ 4158.690249] rfkill: input handler disabled

chocolate42 · 2024-12-29T17:14:25Z

From a live image of Ubuntu 24.04, kernel 6.8, rocm 6.3.1, still getting GPU hangs just like in distrobox. Screen goes blank, 10 seconds later it comes back after a successful reset. I'm guessing RDNA3 iGPU support is poor in rocm, AMD always have been terrible at supporting newer (and older, and current) hardware with rocm. Unless you have a rocm/kernel/prpll combo suggestion or someone demonstrates they get it working, I'm going to write off rocm as broken with the 780M. This is a persistent live image so rocm etc is installed on it still so I can test further if need be.

ubuntu@ubuntu:~/Documents/gpu/builds/preda/gpuowl$ ./build-release/prpll-amd -prp 77936867 -iters 10000
20241229 16:34:30  PRPLL 0.15-95-g1582454 starting
20241229 16:34:30  config: -prp 77936867 -iters 10000 
20241229 16:34:30  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241229 16:34:30 77936867 config: 
20241229 16:34:30 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241229 16:34:42 77936867 OK         0 on-load: blockSize 1000, 0000000000000003
20241229 16:34:42 77936867 Proof of power 10 requires about 9.8GB of disk space
20241229 16:35:09 77936867 OK      2000 f54bf64335a7266e 8921 ETA 8d 01:08; Z=700 (avg 700.3)
20241229 16:36:15 77936867 Stopping, please wait..
20241229 16:36:27 77936867 OK     10000 fc4f135f7cf4ad29 8649 ETA 7d 19:13; Z=748 (avg 708.3)
20241229 16:36:28  Exception "stop requested"
20241229 16:36:28  Bye
ubuntu@ubuntu:~/Documents/gpu/builds/preda/gpuowl$ ./build-release/prpll-amd -prp 77936867 -iters 200000
20241229 16:36:35  PRPLL 0.15-95-g1582454 starting
20241229 16:36:35  config: -prp 77936867 -iters 200000 
20241229 16:36:35  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241229 16:36:35 77936867 config: 
20241229 16:36:35 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241229 16:36:47 77936867 OK     10000 on-load: blockSize 1000, fc4f135f7cf4ad29
20241229 16:36:47 77936867 Proof of power 10 requires about 9.8GB of disk space
20241229 16:37:14 77936867 OK     12000 e637ca5a70dab6ec 8942 ETA 8d 01:34; Z=731 (avg 731.1)
HW Exception by GPU node-1 (Agent handle: 0x57ae98eb08b0) reason :GPU Hang
Aborted (core dumped)
ubuntu@ubuntu:~/Documents/gpu/builds/preda/gpuowl$ ./build-release/prpll-amd -prp 77936867 -iters 200000
20241229 16:37:57  PRPLL 0.15-95-g1582454 starting
20241229 16:37:57  config: -prp 77936867 -iters 200000 
20241229 16:37:57  device 0, OpenCL 3635.0 (HSA1.1,LC), unique id ''
20241229 16:37:57 77936867 config: 
20241229 16:37:57 77936867 FFT: 4.50M 1K:9:256:1:0 (16.52 bpw)
20241229 16:38:09 77936867 OK     12000 on-load: blockSize 1000, e637ca5a70dab6ec
20241229 16:38:09 77936867 Proof of power 10 requires about 9.8GB of disk space
20241229 16:38:36 77936867 OK     14000 b75984c8c4ab3869 8882 ETA 8d 00:16; Z=706 (avg 705.9)
20241229 16:39:28 77936867        20000 3cd1bd9d5e09cbc5 8599
HW Exception by GPU node-1 (Agent handle: 0x6535665c28b0) reason :GPU Hang
Aborted (core dumped)
ubuntu@ubuntu:~/Documents/gpu/builds/preda/gpuowl$ uname -a
Linux ubuntu 6.8.0-41-generic #41-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug  2 20:41:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

780M_u24.04_clinfo_dmesg.zip

preda added a commit that referenced this issue Dec 23, 2024

Fix __builtin_amdgcn_s_sleep() not present on RustiCL (see #276)

e52572e

Support rusticl runtime #276

Support rusticl runtime #276

Comments

chocolate42 commented Feb 18, 2024

chocolate42 commented Feb 19, 2024

preda commented Feb 19, 2024

chocolate42 commented Feb 20, 2024

preda commented Feb 20, 2024

chocolate42 commented Feb 21, 2024

chocolate42 commented Feb 22, 2024

chocolate42 commented Feb 29, 2024

preda commented Apr 26, 2024

chocolate42 commented Jul 21, 2024 • edited Loading

oscarbg commented Dec 5, 2024 • edited Loading

preda commented Dec 6, 2024

oscarbg commented Dec 7, 2024 • edited Loading

preda commented Dec 9, 2024

chocolate42 commented Dec 23, 2024

chocolate42 commented Dec 23, 2024 • edited Loading

preda commented Dec 23, 2024

preda commented Dec 23, 2024

chocolate42 commented Dec 23, 2024

preda commented Dec 23, 2024

chocolate42 commented Dec 23, 2024

preda commented Dec 23, 2024

preda commented Dec 23, 2024

chocolate42 commented Dec 23, 2024

chocolate42 commented Dec 23, 2024

chocolate42 commented Dec 23, 2024 • edited Loading

preda commented Dec 23, 2024

preda commented Dec 23, 2024

chocolate42 commented Dec 23, 2024 • edited Loading

chocolate42 commented Dec 24, 2024 • edited Loading

chocolate42 commented Dec 26, 2024

chocolate42 commented Dec 27, 2024

preda commented Dec 28, 2024 • edited Loading

chocolate42 commented Dec 28, 2024

preda commented Dec 28, 2024

chocolate42 commented Dec 28, 2024

preda commented Dec 28, 2024

chocolate42 commented Dec 28, 2024 • edited Loading

chocolate42 commented Dec 28, 2024 • edited Loading

preda commented Dec 29, 2024 via email • edited Loading

chocolate42 commented Dec 29, 2024

chocolate42 commented Dec 29, 2024

chocolate42 commented Jul 21, 2024 •

edited

Loading

oscarbg commented Dec 5, 2024 •

edited

Loading

oscarbg commented Dec 7, 2024 •

edited

Loading

chocolate42 commented Dec 23, 2024 •

edited

Loading

chocolate42 commented Dec 23, 2024 •

edited

Loading

chocolate42 commented Dec 23, 2024 •

edited

Loading

chocolate42 commented Dec 24, 2024 •

edited

Loading

preda commented Dec 28, 2024 •

edited

Loading

chocolate42 commented Dec 28, 2024 •

edited

Loading

chocolate42 commented Dec 28, 2024 •

edited

Loading

preda commented Dec 29, 2024 via email •

edited

Loading