CPU usage increased during prompt ingestion since version 1.48.1, is it normal? #526

Robot1me · 2023-11-12T11:53:57Z

Robot1me
Nov 12, 2023

Hi, basically what I noticed is that since version 1.48.1, the program uses all CPU cores now during the first prompt ingestion, with seemingly no performance benefit when using cuBLAS . With version 1.47.2, it only uses a few CPU cores during prompt ingestion. As the GPU does the primary work with cuBLAS during the initial prompt ingestion, I have not seen any speed benefit from all CPU cores being used in the new versions during that process.

So I got curious to ask, was there a change made to use all cores anyway to speed up the process a bit? Or is it a bug? Since the GPU is much faster and full CPU usage only needed after the prompt ingestion (generating new text), it seems to me that it may be unintended, as it unnecessarily increases power usage. I tested as well to intentionally throttle my processor, and it barely changed the initial prompt ingestion speed on the GPU, so I'm currently finding this unusual.

Just curious of course, the program works just fine still with this behavior. Thank you!

To add additional information:

Operating system: Windows 10 22H2
Model used: openhermes-2.5-mistral-7b.Q4_K_M.gguf
Settings from setting file:
{"model": null, "model_param": "E:/models/openhermes-2.5-mistral-7b.Q4_K_M.gguf", "port": 5001, "port_param": 5001, "host": "", "launch": false, "lora": null, "config": null, "threads": 8, "blasthreads": 8, "highpriority": false, "contextsize": 4096, "blasbatchsize": 256, "ropeconfig": [0.0, 10000.0], "smartcontext": false, "noshift": false, "bantokens": null, "forceversion": 0, "nommap": false, "usemlock": false, "noavx2": false, "debugmode": 0, "skiplauncher": false, "hordeconfig": null, "noblas": false, "useclblast": null, "usecublas": ["normal", "0"], "gpulayers": 0, "tensor_split": null, "onready": "", "multiuser": false, "remotetunnel": false, "foreground": false}

LostRuins · 2023-11-12T13:21:08Z

LostRuins
Nov 12, 2023
Maintainer

Try using less threads

1 reply

Robot1me Nov 14, 2023
Author

Thanks for your response. While that would help for the initial prompt processing, it would be sadly a band-aid fix with negative trade-offs, since it would then vastly slow down the generation of new text. Essentially I'm just curious what changed since version 1.48.1 for this to happen, as it combined the best of both worlds before (mainly GPU used during processing, then all CPU cores later for new text).

So far, the only appropriate workaround I found to reduce the unnecessary CPU load is to set CPU affinity during the prompt processing (e.g. when 3000 context tokens are processed at once), and then undoing it when the text starts generating:

If it's maybe a change related to LlamaCpp, I hope you can keep an eye on this. I understand it's probably a side effect of the changes they made in their project. Thanks!

LostRuins · 2023-11-15T00:48:42Z

LostRuins
Nov 15, 2023
Maintainer

You can specify different number of threads to use during processing and generating. Please check blasthreads on the wiki

0 replies

ghost · 2023-11-17T18:20:23Z

ghost
Nov 17, 2023

Setting it to 'max threads' will slow it down, in my experience. Try like, four. (0, 2, 4, 6)

0 replies

Robot1me · 2024-04-13T12:41:40Z

Robot1me
Apr 13, 2024
Author

I just noticed that koboldcpp 1.62 fixed this 🙂 So some newer commit in llamacpp must have addressed this behavior again. So for anyone reading this, it's very much worth it to update over this bug fix.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU usage increased during prompt ingestion since version 1.48.1, is it normal? #526

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

CPU usage increased during prompt ingestion since version 1.48.1, is it normal? #526

Uh oh!

Robot1me Nov 12, 2023

Replies: 4 comments · 1 reply

Uh oh!

LostRuins Nov 12, 2023 Maintainer

Uh oh!

Robot1me Nov 14, 2023 Author

Uh oh!

LostRuins Nov 15, 2023 Maintainer

Uh oh!

ghost Nov 17, 2023

Uh oh!

Robot1me Apr 13, 2024 Author

Robot1me
Nov 12, 2023

Replies: 4 comments 1 reply

LostRuins
Nov 12, 2023
Maintainer

Robot1me Nov 14, 2023
Author

LostRuins
Nov 15, 2023
Maintainer

ghost
Nov 17, 2023

Robot1me
Apr 13, 2024
Author