Parallelisation on RPC produces question marks #13716

becky-soda · 2025-05-23T03:27:23Z

becky-soda
May 23, 2025

Hey. We're encountering a bit of trouble and we're not sure how to approach solving it. I'm not confident enough to post an issue, so I wanted to ask if anyone has any ideas.

We have three computers on a 2.5Gb ethernet LAN:

An Intel i9 with a GeForce RTX 4090 (24GB VRAM) and a GeForce RTX 4060 Ti (16GB), running Windows 11.
An Intel i5 with two GeForce RTX 4060 Ti (16GB), running Windows 11.
An AMD Ryzen with one Radeon 7900XTX (24GB), running Ubuntu 24.04.

We've been running llama.cpp with RPC over these machines perfectly, using a build we cloned from Github on the 4th May and compiled ourselves. We're using '--parallel 2' in the parameters for the server, to allow two users to simultaneously use the system.

Yesterday we refreshed our local copy with a 'git pull', copied that onto the three different machines, compiled using the same environment variables and flags, and attempted to run the newer version. The inference works fine for about a minute, but then begins spamming question marks without any spaces on both clients. Once it begins producing nothing but question marks, we can't make it produce anything else without restarting the server. If we remove the '--parallel 2' parameter, it works fine (with no question mark spam), but this isn't ideal as only one of us can use the system at a time.

We tried doing another 'git pull' 24 hours later and copying this onto the three machines, recompiling, and retesting, but the results are the same: about one minute of good inference, then lots of question marks on both clients. Disabling parallelisation makes it work fine again, but for one user only.

We tried disabling Sliding Window Attention with the '--swa-full' parameter, but that makes no difference, and we still get question mark spam. We also tried compiling a ROCm-only version and a Vulkan-only version for the third machine, but the backend doesn't seem to make a difference. We tried running the same model using the CUDA+Windows devices only (that is, computers one and two only) with a smaller context, but the problem persists.

The model we are using is an i1-IQ4_XS gguf of Command A (c4ai-command-a-03-2025.i1-IQ4_XS.gguf).

Is there anything that could be causing this fault from our end? Is there anything we can do to debug what's causing these question marks?

Answered by ggerganov

May 24, 2025

Thank you for the information. Follow #13733 for the resolution.

View full answer

rgerganov · 2025-05-23T06:59:57Z

rgerganov
May 23, 2025
Collaborator

Is there anything we can do to debug what's causing these question marks?

If the last working build is from May 4th, the simplest thing you can do is to perform a binary search through the git history, looking for the offending commit. Right now there are ~185 commits since then which mean you can find the offending commit with 8 steps involving build, deploy and test. Once you have the offending commit, it will be much easier to debug the root cause for this issue.

7 replies

becky-soda May 24, 2025
Author

Flash Attention is enabled. Here is the command we are using:

llama-server -m "X:\c4ai-command-a-03-2025.i1-IQ4_XS.gguf" --host 192.168.2.69 --port 5002 -c 48160 --rpc 192.168.2.94:50052,192.168.2.70:50052 --n_gpu_layers 99 -fa --no-warmup --parallel 2

ggerganov May 24, 2025
Maintainer

Do you encounter the problem if you remove the -fa?

becky-soda May 24, 2025
Author

We disabled Flash Attention (removed the -fa), and the problem hasn't occurred.

ggerganov May 24, 2025
Maintainer

Thank you for the information. Follow #13733 for the resolution.

Answer selected by becky-soda

becky-soda May 29, 2025
Author

Thank you. I finally got around to testing, and it works perfectly. Thanks to everyone so much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelisation on RPC produces question marks #13716

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Parallelisation on RPC produces question marks #13716

Uh oh!

becky-soda May 23, 2025

Replies: 1 comment · 7 replies

Uh oh!

rgerganov May 23, 2025 Collaborator

Uh oh!

becky-soda May 24, 2025 Author

Uh oh!

ggerganov May 24, 2025 Maintainer

Uh oh!

becky-soda May 24, 2025 Author

Uh oh!

ggerganov May 24, 2025 Maintainer

Uh oh!

becky-soda May 29, 2025 Author

becky-soda
May 23, 2025

Replies: 1 comment 7 replies

rgerganov
May 23, 2025
Collaborator

becky-soda May 24, 2025
Author

ggerganov May 24, 2025
Maintainer

becky-soda May 24, 2025
Author

ggerganov May 24, 2025
Maintainer

becky-soda May 29, 2025
Author