Nvidia Power Management Mode makes Koboldcpp 50% Slower #1057
-
Hello i like to know why is continuous text generating faster. When i translate text with a gguf model "lmg-anon/vntl-llama3-8b-gguf". Continuous: 80ms wait time. My Specs: Many Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Have found out why is the "Power management mode" from nvidia when i set it from Normal to Maximum Performance is my Generation 50% faster. I use gguf just to translate small text from games so not more then 20-40 tokens to Generate. I dont use the Power management mode to much power use when i dont use koboldccp. |
Beta Was this translation helpful? Give feedback.
Have found out why is the "Power management mode" from nvidia when i set it from Normal to Maximum Performance is my Generation 50% faster.
https://nvidia.custhelp.com/app/answers/detail/a_id/3130/~/setting-power-management-mode-from-normal-to-maximum-performance
I use gguf just to translate small text from games so not more then 20-40 tokens to Generate.
I dont use the Power management mode to much power use when i dont use koboldccp.
I use msi afterburner to load a profile when i need it.
Howto make a profile with full mhz load "curve editor > select biggest mhz your gpu can use and press L to lock it > save new profile".