You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using llama-cpp-python==0.2.60, installed using this command CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python.
I'm able to load a model using type_k=8 and type_v=8 (for q8_0 cache). However, as soon as I try to generate something with the model, it fails like this:
Basically, I am able to load a model with 8-bit cache, but I can't actually inference with the model.
uname -a: Darwin MacBook-Air.local 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:19:22 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T8112 arm64
The text was updated successfully, but these errors were encountered:
Hi! :)
I'm using
llama-cpp-python==0.2.60
, installed using this commandCMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
.I'm able to load a model using
type_k=8
andtype_v=8
(for q8_0 cache). However, as soon as I try to generate something with the model, it fails like this:Basically, I am able to load a model with 8-bit cache, but I can't actually inference with the model.
uname -a
:Darwin MacBook-Air.local 23.4.0 Darwin Kernel Version 23.4.0: Fri Mar 15 00:19:22 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T8112 arm64
The text was updated successfully, but these errors were encountered: