[BUG] Random slowdowns in tensor parallel. #630

Ph0rk0z · 2024-09-21T13:31:08Z

OS

Linux

GPU Library

CUDA 11.8

Python version

3.10

Pytorch version

2.4

Model

Luminum 123b 4.0bpw H6

Describe the bug

I am happily genning at 11-16t/s. Suddenly random messages go really slow and the t/s drops. After some more messages it goes back up.

I went into nvtop to check GPU temps but they were all in the 60s. One GPU is cranking and the others are at low % for some reason as if it was processing sequentially.

Am not sure if it's related to my machine but it's a new behavior for me since 2.2 and dev. Mostly probing to see if anyone has experienced the same.

Reproduction steps

Generate as normal. Some messages will be slow.

Expected behavior

Consistent speeds.

Logs

No response

Additional context

Acknowledgements

I have looked for similar issues before submitting this one.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

grimulkan · 2024-10-18T22:15:05Z

You're not rolling over the KV cache at ~4K context or something like that, right? I don't experience this with the latest dev pull, but I also usually don't have conversations longer than the pre-allocated sequence length. That said, paged attention is now pretty good at handling rolling cache too... so maybe not relevant.

Ph0rk0z · 2024-10-19T14:04:01Z

Nope, it's 32k model. Haven't seen it crop up lately though.

Well it happened again. On qwen2.5 at about 10k context. Switching to different prompts didn't fix it. I had to stop and restart the server and suddenly, replies with the same long context are fast again.

Ph0rk0z added the bug Something isn't working label Sep 21, 2024

Ph0rk0z mentioned this issue Jan 23, 2025

[BUG] OpenAI client takes a long time to receive the last token on every few generations theroyallab/tabbyAPI#274

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Random slowdowns in tensor parallel. #630

[BUG] Random slowdowns in tensor parallel. #630

Ph0rk0z commented Sep 21, 2024

grimulkan commented Oct 18, 2024

Ph0rk0z commented Oct 19, 2024 •

edited

Loading

[BUG] Random slowdowns in tensor parallel. #630

[BUG] Random slowdowns in tensor parallel. #630

Comments

Ph0rk0z commented Sep 21, 2024

OS

GPU Library

Python version

Pytorch version

Model

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

grimulkan commented Oct 18, 2024

Ph0rk0z commented Oct 19, 2024 • edited Loading

Ph0rk0z commented Oct 19, 2024 •

edited

Loading