-
Hi everyone, I have a question about the --gpulayers number and how to determine what the proper number is. I have been using --gpulayers 1 and it works fine for now, but I would like to know if there is a proper way to determine what it actually SHOULD be. I'm using this as my command to start koboldcpp right now it works great, no problems, just looking to see if I can grind out any more performance on it. I've found out at this time I can only load up 7B files with acceptable response speeds. The 13B files will load, but take forever (and a day) to respond to any chats. I'm sure that's more of a RAM problem though. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Each model has a different number of layers. A 7B model typically has about 32 layers, a 13B model has about 43, and a 70B model may have almost 80 layers. The best way to determine how many layers you can offload is by trial and error, specifically picking a value and seeing how much VRAM you have used once its loaded. If you exceed, the program will close and you can try again with a lower value. |
Beta Was this translation helpful? Give feedback.
--gpulayers 1
literally just uses a single layer from the model, which isn't going to be very much.Each model has a different number of layers. A 7B model typically has about 32 layers, a 13B model has about 43, and a 70B model may have almost 80 layers. The best way to determine how many layers you can offload is by trial and error, specifically picking a value and seeing how much VRAM you have used once its loaded. If you exceed, the program will close and you can try again with a lower value.