Parallelisation on RPC produces question marks #13716
-
Hey. We're encountering a bit of trouble and we're not sure how to approach solving it. I'm not confident enough to post an issue, so I wanted to ask if anyone has any ideas. We have three computers on a 2.5Gb ethernet LAN:
We've been running llama.cpp with RPC over these machines perfectly, using a build we cloned from Github on the 4th May and compiled ourselves. We're using '--parallel 2' in the parameters for the server, to allow two users to simultaneously use the system. Yesterday we refreshed our local copy with a 'git pull', copied that onto the three different machines, compiled using the same environment variables and flags, and attempted to run the newer version. The inference works fine for about a minute, but then begins spamming question marks without any spaces on both clients. Once it begins producing nothing but question marks, we can't make it produce anything else without restarting the server. If we remove the '--parallel 2' parameter, it works fine (with no question mark spam), but this isn't ideal as only one of us can use the system at a time. We tried doing another 'git pull' 24 hours later and copying this onto the three machines, recompiling, and retesting, but the results are the same: about one minute of good inference, then lots of question marks on both clients. Disabling parallelisation makes it work fine again, but for one user only. We tried disabling Sliding Window Attention with the '--swa-full' parameter, but that makes no difference, and we still get question mark spam. We also tried compiling a ROCm-only version and a Vulkan-only version for the third machine, but the backend doesn't seem to make a difference. We tried running the same model using the CUDA+Windows devices only (that is, computers one and two only) with a smaller context, but the problem persists. The model we are using is an i1-IQ4_XS gguf of Command A (c4ai-command-a-03-2025.i1-IQ4_XS.gguf). Is there anything that could be causing this fault from our end? Is there anything we can do to debug what's causing these question marks? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
If the last working build is from May 4th, the simplest thing you can do is to perform a binary search through the git history, looking for the offending commit. Right now there are ~185 commits since then which mean you can find the offending commit with 8 steps involving build, deploy and test. Once you have the offending commit, it will be much easier to debug the root cause for this issue. |
Beta Was this translation helpful? Give feedback.
Thank you for the information. Follow #13733 for the resolution.