Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to keep model loaded #327

Open
Showgofar opened this issue Mar 30, 2023 · 2 comments
Open

Is it possible to keep model loaded #327

Showgofar opened this issue Mar 30, 2023 · 2 comments

Comments

@Showgofar
Copy link

Hello
Is it possible to keep model loaded?
Every request takes longer time because it is need also time to load model.
image

@michaelwdombek
Copy link

Hey, think there is already an issue #181 open with your request :)

@64jcl
Copy link

64jcl commented Apr 4, 2023

Yes this would speed up requests immensely. I don't know exactly how these models work but it looks a bit odd that it seems slow to add each token in the network. Is the response that includes a copy of the input the first prediction done, so it sort of predicts the same tokens that you initially fed it and it always has to go through this process for every query into it? The reason for me thinking this is that it outputs these in the same rate as it outputs the actual response tokens.

So perhaps the reloading of the model is done every time because it needs to be in a "clean state" for another request otherwise it would use the previous state as part of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants