Llama.cpp a lot slower when using cmake compared to using w64devkit #594
Unanswered
v4lentin1879
asked this question in
Q&A
Replies: 1 comment 2 replies
-
Did you make sure you are building in Release mode? (CMAKE_BUILD_TYPE=Release) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm running mistral-orca on llama.cpp on my mac. The application I am using it for runs on macos and windows. It's a typescript application using the node-llama-cpp library. Due to that library I need to build the binaries using cmake.
Now my issue is that the llama.cpp binaries are a lot slower when using cmake compared to using w64devkit on windows. I'm not even at the step where I wrap the node-llama-cpp library around it.
Does anyone know the reason why cmake makes it this slow and w64devkit doesn't? Is there a possibility to fix this using a flag or so? I am running it on a 2019 macbook pro with an i7, 6 Core. I am using CPU inference only.
w64devkit:
llama_print_timings: load time = 2789.31 ms
llama_print_timings: sample time = 7.55 ms / 18 runs ( 0.42 ms per token, 2383.16 tokens per second)
llama_print_timings: prompt eval time = 1925.06 ms / 20 tokens ( 96.25 ms per token, 10.39 tokens per second)
llama_print_timings: eval time = 8256.93 ms / 18 runs ( 458.72 ms per token, 2.18 tokens per second)
llama_print_timings: total time = 23842.73 ms
cmake:
llama_print_timings: load time = 4133.27 ms
llama_print_timings: sample time = 5.71 ms / 18 runs ( 0.32 ms per token, 3153.47 tokens per second)
llama_print_timings: prompt eval time = 25917.85 ms / 19 tokens ( 1364.10 ms per token, 0.73 tokens per second)
llama_print_timings: eval time = 43493.77 ms / 18 runs ( 2416.32 ms per token, 0.41 tokens per second)
llama_print_timings: total time = 74989.23 ms
Thanks a lot for your help!
Beta Was this translation helpful? Give feedback.
All reactions