ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched#17276
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched#17276
Conversation
3df2f6d to
6d90fe9
Compare
|
I tried this change after reverting #17143 but it doesn't trigger an error using the llama.cpp/ggml/src/ggml-alloc.c Lines 1052 to 1055 in 6d90fe9 |
|
I was trying to reproduce this, but I get this assert when running |
|
I think these asserts can be safely removed. Will take a look tomorrow. |
|
The asserts are now removed on |
6d90fe9 to
0710d5f
Compare
|
I have verified that the Vulkan issue is indeed due to different graph orders depending on batch size. The code causing this seems to be this: llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp Lines 12954 to 12967 in fdbff91 I think this should be fixed in the Vulkan backend so that changes in tensor sizes do no change the order of the graph. Meanwhile, we could run the tests with GGML_VK_DISABLE_GRAPH_OPTIMIZE.
|
…gml_backend_sched Enabled in ggml-ci for testing.
fdbff91 to
5a9485c
Compare
…gml_backend_sched (ggml-org#17276) * ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched Enabled in ggml-ci for testing. * llama : update worst-case graph for unified cache * ci : disable op offload in some tests * fix spelling --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
…gml_backend_sched (ggml-org#17276) * ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched Enabled in ggml-ci for testing. * llama : update worst-case graph for unified cache * ci : disable op offload in some tests * fix spelling --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
No description provided.