You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 12, 2024. It is now read-only.
I guess it was caused by useMmap?
llama.cpp will enable useMmap by default. What I v found in your llama-node code example, you seems did not enable mmap for reusing file cache in the memory, that is probably why you run out of memory I think?
I'm afraid that is not the case. Before you updated the version of llama.cpp, I couldn't run my example (with or without setting useMmap). Now it doesn't crash, but it doesn't seem to be doing anything either.
I recorded a video comparing llama-node and llama.cpp:
llama-node-issue.mp4
As you can see, llama-node sort of freezes with the larger input, whereas llama.cpp starts emitting tokens after ~30 secs.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm trying to process a large text file. For the sake of reproducibility, let's use this. The following code:
Expand to see the code
Crashes the process with a segfault error:
When I compile the exact same version of llama.cpp and run it with the following args:
It runs perfectly fine (of course with a warning that the context is larger than what the model supports but it doesn't crash with a segfault).
Comparing the logs:
llama-node Logs
llama.cpp Logs
Looks like the context size in llama-node is set to 4GBs and the
kv self size
is twice as large as what llama.cpp used.I'm not sure if I'm missing something in my Load/Invocation config or if that's an issue in llama-node. Can you please have a look?
The text was updated successfully, but these errors were encountered: