-
Notifications
You must be signed in to change notification settings - Fork 0
llama.cpp on Nvidia RTX-3500, RTX-A4500 dual, RTX-4090 dual #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
49G on CPU (64G) - RTX-3500 Lenovo P1Gen6 13800H
You can also pass in an array of values to populate the tree: int[] values = { 4, 5, 7, 8 };
BinaryTree tree = new BinaryTree(values); LicenseThis project is released under the MIT license. See LICENSE for more details. [end of text] llama_print_timings: load time = 14205.60 ms system_info: n_threads = 10 / 20 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | binary tree in java // create a node class to store data and pointers to left and right child nodes
}
} llama_print_timings: load time = 12756.12 ms
|
Falcon 40B on CPU 80-100G (falcon 180B needs 400G)
|
at 2.2 GB/s write on a samsung 990 pro NvME it takes about a min to combine the 2 into one 96G file take out -ngl 64
all 3 parts a/b/c total 150G SSD and 140G of ram
160 of 192G ram at 91% cpu on 13900k Think I need segment c as well 96 != 137 |
On 13800h p1gen6
On 13900k desktop
why slower |
CUDA on llama.cpp adjusting the ENV variable works well - below or shortened copy fix - add to PATH
solving
using as a reference https://github.com/obrienlabs/CUDA-Programs/tree/main/Chapter01/gpusum as part of the book from Richard Ansorge of University of Cambridge https://www.cambridge.org/core/books/programming-in-parallel-with-cuda/C43652A69033C25AD6933368CDBE084C |
revisit llama.cpp for nvidia gpus
look at abetlen/llama-cpp-python#871 |
|
https://pytorch.org/get-started/locally/ 12,1 not 12.2
|
checking context length outputs = model.generate(**input_ids, max_new_tokens=1000) working
using
|
python pip summary
|
llama-serveron RTX-3500
|
see #7
test
git clone https://github.com/ggerganov/llama.cpp
model
https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF
https://huggingface.co/TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF/blob/main/capybarahermes-2.5-mistral-7b.Q8_0.gguf
using w64devkit on Lenovo P1gen6 RTX-3500 12G
https://github.com/skeeto/w64devkit/releases
The text was updated successfully, but these errors were encountered: