-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Version 0.2.50] Models get unloaded from memory after each query #181
Comments
+1 to this! I'm using version 0.3.1 and have experienced the same issue with models getting unloaded from memory after each query. It would be great if there was an option to keep the models loaded in memory to reduce the overhead and make it more efficient to use the web interface. Thanks for considering this feature request! |
If this is true, and this currently loads the entire model into memory then deallocates after every prompt, I would recommend this being the #1 project priority to sort out. This is extremely inefficient and even makes the application unusable for any large weights or non-beefy PC's. Anyone have an idea how to tackle this? |
I have the same problem, any one can do a PR to solve that? This is a major issue. |
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
I am testing the llama 30B model and noticed that whenever I write a query it takes a long time to load the model to memory, then it writes the response, THEN it deallocates the used memory. I think this overhead will make it inefficient to use the web interface. This was also noticed when choosing alpaca 7B but the effect isn't as noticeable given the smaller model.
Would appreciate if there was an option to keep the models loaded to eliminate this overhad.
The text was updated successfully, but these errors were encountered: