-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server slowing down with each request (requests are identical) #4201
Comments
It's not something that only happens with the server example; it also occurs with the ./main example, so it's an internal issue with the function that has a regression in performance as the context is filled. |
Thanks for confirming @FSSRepo. Should this be moved to |
If the cache is cleared correctly, it would not be slower after each request, so there seems to be a server-specific problem. |
Upon reviewing carefully, it seems so, although I believe that only happens if the requests are launched at the same time in different slots. Too remember the token speed it's a mean from start the task to end the task. Sorry for edit your comment, i sometimes confuse quote with edit in GitHub Android App |
I've also seen this issue with ./server when used with The first request runs at full expected speed but the following requests generate slower (with identical prompt). |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Was this fixed? I'm still having this issue. |
I think it should be fixed. If you reproduce it, please provide the |
Pre-Prerequisite
Thanks to all the contributors for all the great work on llama.cpp!
Prerequisites
Expected Behaviour
Current Behaviour
prompt_eval
time gets much slower.Environment and Context
Physical (or virtual) hardware you are using: Physical hardware, Nvidia GPU
Operating System: Linux
Failure Information (for bugs)
Please help provide information about the failure / bug.
Steps to Reproduce
Thanks!
The text was updated successfully, but these errors were encountered: