-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-cpp: add gpu layers parameter #4739
llama-cpp: add gpu layers parameter #4739
Conversation
langchain/llms/llamacpp.py
Outdated
@@ -64,6 +64,9 @@ class LlamaCpp(LLM): | |||
"""Number of tokens to process in parallel. | |||
Should be a number between 1 and n_ctx.""" | |||
|
|||
n_gpu_layers: Optional[int] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as i can tell this isn't optional in llama_cpp package, so probably don't want to allow it to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other params in this wrapper, in llama-cpp-python progect it's optional with default value 0
https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama.py#L86
tested without this param:
llm = LlamaCpp(model_path=...)
Output:
llama_model_load_internal: [cublas] offloading 0 layers to GPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by optional i mean typed as Optional (meaning users could pass in None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, correct in that regards, updated, is it better now
n_gpu_layers: Optional[int] = Field(0, alias="n_gpu_layers")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im suggesting we make it n_gpu_layers: int = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually updated to leave it as None and not pass in if it is None, since that would break backwards compatability ( (since n_gpu_layers support was only recently added to llama-cpp-python)
I'm on langchain 0.0.173, updated today, and setting n_gpu_layers=3 in my clone of privateGPT. I am seeing activity on my ATI onboard graphics but in windows 11 performance manager I am seeing a flat line on my 3070. how do I target my discrete graphics card? I am not experiencing much of a speedup here is how I am configured: |
@thekit under the hood langchain.llms.LlamaCpp uses llama-cpp-python As reference here how I'm doing it (Ubuntu 22), update-llama.sh script in langchain folder: #!/bin/bash
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DLLAMA_NATIVE=on";
export FORCE_CMAKE=1;
pip uninstall llama-cpp-python -y
pip --no-cache-dir install llama-cpp-python |
Adds gpu layers parameter to llama.cpp wrapper
After a change:
llm = LlamaCpp(model_path=..., n_gpu_layers=3)
Output:
For review: