ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

etasnadi · 2025-05-26T23:28:15Z

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
test-backend-ops: adds additional tests for CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

tests/test-backend-ops.cpp

ggml/src/ggml-vulkan/vulkan-shaders/conv_transpose_1d.comp

Number of additional tests reduced to 108.

ggml/src/ggml-vulkan/ggml-vulkan.cpp

etasnadi · 2025-05-27T21:01:14Z

The Ubuntu 22 runner stopped because test-backend-ops crashes with segfault. I've observed this behavior on a raw build before as well so I don't know if it has anything to do with my modifications or if it makes this happen more likely.

Edit: when the app crashes with segfault it is always during the exit after the tests were executed.

jeffbolznv · 2025-05-27T21:15:11Z

It's showing failures in some of the conv_transpose_1d tests, e.g.:

29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=1,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.459058786 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=2,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.222932697 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=3,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.167964696 > 0.000000100 FAIL

Then it appears to crash on the first test after the conv_transpose_1d tests, which does seem like it's related to this change somehow.

etasnadi · 2025-05-27T23:01:27Z

It's showing failures in some of the conv_transpose_1d tests, e.g.:
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=1,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.459058786 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=2,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.222932697 > 0.000000100 FAIL
29:   CONV_TRANSPOSE_1D(ne_input=[2173,1,1,1],ne_kernel=[1337,1,1,1],s0=3,p0=0,d0=1): [CONV_TRANSPOSE_1D] NMSE = 0.167964696 > 0.000000100 FAIL
Then it appears to crash on the first test after the conv_transpose_1d tests, which does seem like it's related to this change somehow.

That's interesting. I could finally reproduce these errors with llvmpipe, but not with the discrete GPU. I need time to investigate what's the root cause.

jeffbolznv · 2025-05-27T23:38:03Z

Wild guess - any chance of uninitialized shared memory?

etasnadi · 2025-05-28T00:36:56Z

Wild guess - any chance of uninitialized shared memory?

It can be, but there is a chance that the code is correct but llvmpipe what's the root cause of the failing tests.

I executed parameter sweeps (Cin, K, L) and turns out that if one (or combination of parameters) are large enough, the test fails. Even if I test with Cin=20000, L=64, K=4 the test fails that's suspicious because Cin "does not influence the logic too much". Cin=20000, L=64, K=3 passes.

Argmax also segfaults on llvmpipe on my computer for some reason.

* Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings.

etasnadi · 2025-05-28T21:35:10Z

Wild guess - any chance of uninitialized shared memory?

No, I was not aware of the fact that iterations in loops are simply skipped after reaching a certain limit on llvmpipe, so I decided to reduce the maximum test problem size to 1337x13 (input x kernel) in order to to pass the tests on llvmpipee in commit 2813cf4. Llvmpipe starts to trim the loops after ~30k iterations so this number should be safe and also realistic opposed to my previous edge cases. 5120x512 also worked for me with 128 threads resulting in ~20k iterations per thread.

* ggml-vulkan: adds op CONV_TRANSPOSE_1D

6694ab6

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels May 26, 2025

jeffbolznv reviewed May 27, 2025

View reviewed changes

Missing barrier added to shader.

1b71c23

Number of additional tests reduced to 108.

jeffbolznv reviewed May 27, 2025

View reviewed changes

ggml/src/ggml-vulkan/ggml-vulkan.cpp Outdated Show resolved Hide resolved

etasnadi added 2 commits May 28, 2025 23:06

* Fixes typo in variable name.

c56820d

* Removes extra whitespaces. * Adds int64->int32 casts to prevent possible warnings.

Problem size reduced in tests to pass tests with llvmpipe.

2813cf4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

Uh oh!

etasnadi commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etasnadi commented May 27, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 27, 2025

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 28, 2025

Uh oh!

etasnadi commented May 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

Are you sure you want to change the base?

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D #13813

Uh oh!

Conversation

etasnadi commented May 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

etasnadi commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 27, 2025

Uh oh!

jeffbolznv commented May 27, 2025

Uh oh!

etasnadi commented May 28, 2025

Uh oh!

etasnadi commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

etasnadi commented May 27, 2025 •

edited

Loading

etasnadi commented May 28, 2025 •

edited

Loading