-
-
Notifications
You must be signed in to change notification settings - Fork 676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text generation - getting slower and slower? #105
Comments
This may be due to the memory leak issues which pollute the graph. In which case, use a technique I use in the Cloud Run apps to reset it: tf.reset_default_graph()
sess.close()
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess) |
Thank you. I suppose you mean to put this inside the loop to reset the graph periodically. E.g., every 100 iterations. Is that correct? I have tried this for 2 hours and and the problem seems to have gone. So I guess that is indeed the cause. Would be great to have this fixed in the future release if possible. However for now I am happy with this temporary fix. Thanks again |
Related: would be great if prefix could be entered as a list, and the elements of the list were processed and returned in parallel -- like batch, but with different prefixes. The time generate is taking seems way too long. |
See #140 |
I am facing issues with generate time too..since in production expectation is that it should be < 100ms. Do you think Tensorflow serving can help here? And how would encoding work during inference with Tensorflow Serving? |
Have you solved the speed problem you encountered in TF serving |
I have fine tuned a model and I am now using the model to generate text. I am doing this on Google colab that has a default of 26GB memory. My process loops through a collection of sentences, using them as 'prefix' to generate a paragraph. However, it seems that this process is gradually getting slower and slower, as it takes longer and longer time to complete one loop, although the sentence does not get any longer from loop to loop.
I don't really understand this, as I thought the text generation process should have constant performance each time the method is called?
My code looks as follows - have I used it in the wrong way?
EDIT: I also notice that during this time the memory usage is increasing. When I started the process on my AWS server solely configured for this purpose, it only uses 5% memory. It has run over 24 hours now, and now using 53% of memory. Why is that?
EDIT 2: I can confirm the pattern again. The process has been running for almost 2 days, and now the memory usage has gone up to 73%. And it now takes 3 minutes to generate one output, gone up significantly from 10 seconds at the beginning.
Can anyone please help? This does not look normal to me.
Thanks
Log showing the increase of time taken. As you can see it takes from just 10secs to more than a minute, per sentence.
The text was updated successfully, but these errors were encountered: