VERY VERY Slow on the rtx 4050 and i5-12455 and 16 gb ram #1719

Asory2010 · 2023-06-06T18:13:49Z

I also have cublas enabled and i have tried both 13b and 7b models and it takes ages to even spell on token. I am using these parameters:

main -i --interactive-first -r "### Human:" --temp 0 -c 2048 -n -1 --reapeate_penalty 1.2 --instruct --color -m wizard-mega-13b.ggml.q4_0.bin

ghost · 2023-06-06T18:20:28Z

Add the -t parameter to your prompt, perhaps -t 4.

You might try lowering the batch # for the model to begin responding quicker with -b 10 in your prompt.

Asory2010 · 2023-06-06T18:22:58Z

Add the -t parameter to your prompt, perhaps -t 4.

You might try lowering the batch # for the model to begin responding quicker with -b 10 in your prompt.

did not work): plus it crashes after a while in the loading procces

ghost · 2023-06-06T20:09:56Z

Add the -t parameter to your prompt, perhaps -t 4.
You might try lowering the batch # for the model to begin responding quicker with -b 10 in your prompt.

did not work): plus it crashes after a while in the loading procces

If your token generation is extremely slow, then try -t 1 and work your way up from there. Here's more information, including GPU with cuBlas:

https://github.com/ggerganov/llama.cpp/blob/master/docs/token_generation_performance_tips.md

This is the limit of my knowledge on the subject, so if it continues to crash then I suggest someone else troubleshoot with @Asory2010

gjmulder · 2023-06-07T09:55:13Z

Run top or atop to see how many threads are active on your CPU. As a rough rule of thumb you want to set -t to the number of physical cores on your CPU (usually half the number of hypercores the system reports).

Run nvidia-smi to see what is happening on your GPU. If your CPU isn't the bottleneck you should see 25-50% GPU utilisation after configuring -ngl.

EDIT: The Intel® Core™ i5-1245U Processor has 2 fast and 8 slow CPU cores. I'd try to set -t to 2, 4, 6, 8, 10 to see if the slow CPU cores actually help performance.

Asory2010 · 2023-06-10T14:47:10Z

Run top or atop to see how many threads are active on your CPU. As a rough rule of thumb you want to set -t to the number of physical cores on your CPU (usually half the number of hypercores the system reports).

Run nvidia-smi to see what is happening on your GPU. If your CPU isn't the bottleneck you should see 25-50% GPU utilisation after configuring -ngl.

EDIT: The Intel® Core™ i5-1245U Processor has 2 fast and 8 slow CPU cores. I'd try to set -t to 2, 4, 6, 8, 10 to see if the slow CPU cores actually help performance.

Quick Update after some testing the text gen became wayyyyy faster but the loading time still remained slow, why is that?

github-actions · 2024-04-10T01:07:44Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 10, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VERY VERY Slow on the rtx 4050 and i5-12455 and 16 gb ram #1719

VERY VERY Slow on the rtx 4050 and i5-12455 and 16 gb ram #1719

Asory2010 commented Jun 6, 2023

ghost commented Jun 6, 2023 •

edited by ghost

Loading

Asory2010 commented Jun 6, 2023

ghost commented Jun 6, 2023

gjmulder commented Jun 7, 2023 •

edited

Loading

Asory2010 commented Jun 10, 2023

github-actions bot commented Apr 10, 2024

VERY VERY Slow on the rtx 4050 and i5-12455 and 16 gb ram #1719

VERY VERY Slow on the rtx 4050 and i5-12455 and 16 gb ram #1719

Comments

Asory2010 commented Jun 6, 2023

ghost commented Jun 6, 2023 • edited by ghost Loading

Asory2010 commented Jun 6, 2023

ghost commented Jun 6, 2023

gjmulder commented Jun 7, 2023 • edited Loading

Asory2010 commented Jun 10, 2023

github-actions bot commented Apr 10, 2024

ghost commented Jun 6, 2023 •

edited by ghost

Loading

gjmulder commented Jun 7, 2023 •

edited

Loading