You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes this would speed up requests immensely. I don't know exactly how these models work but it looks a bit odd that it seems slow to add each token in the network. Is the response that includes a copy of the input the first prediction done, so it sort of predicts the same tokens that you initially fed it and it always has to go through this process for every query into it? The reason for me thinking this is that it outputs these in the same rate as it outputs the actual response tokens.
So perhaps the reloading of the model is done every time because it needs to be in a "clean state" for another request otherwise it would use the previous state as part of it?
Hello
Is it possible to keep model loaded?
Every request takes longer time because it is need also time to load model.
The text was updated successfully, but these errors were encountered: