Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text generation - getting slower and slower? #105

Open
ziqizhang opened this issue Aug 15, 2019 · 6 comments
Open

Text generation - getting slower and slower? #105

ziqizhang opened this issue Aug 15, 2019 · 6 comments

Comments

@ziqizhang
Copy link

ziqizhang commented Aug 15, 2019

I have fine tuned a model and I am now using the model to generate text. I am doing this on Google colab that has a default of 26GB memory. My process loops through a collection of sentences, using them as 'prefix' to generate a paragraph. However, it seems that this process is gradually getting slower and slower, as it takes longer and longer time to complete one loop, although the sentence does not get any longer from loop to loop.

I don't really understand this, as I thought the text generation process should have constant performance each time the method is called?

My code looks as follows - have I used it in the wrong way?

EDIT: I also notice that during this time the memory usage is increasing. When I started the process on my AWS server solely configured for this purpose, it only uses 5% memory. It has run over 24 hours now, and now using 53% of memory. Why is that?

EDIT 2: I can confirm the pattern again. The process has been running for almost 2 days, and now the memory usage has gone up to 73%. And it now takes 3 minutes to generate one output, gone up significantly from 10 seconds at the beginning.

Can anyone please help? This does not look normal to me.

Thanks

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name='run1')
with open(outFile, 'a+', newline='\n') as f:
        writer = csv.writer(f, delimiter=",", quotechar='"')
        count=0
        for l in lineList:
            print(str(datetime.datetime.now())+","+str(count))
            l = re.sub('[^0-9a-zA-Z]+', ' ', l).strip()
            texts = gpt2.generate(sess, return_as_list=True,
                                        temperature=1.0,
                                        nsamples=2,
                                        batch_size=2,
                                        length=200,
                                        prefix=l,
                                        include_prefix=False)
            row=[l]
            for t in texts:
                if l in t:
                    t=t[len(l):].strip()
                row.append(t)

            writer.writerow(row)
            count+=1

Log showing the increase of time taken. As you can see it takes from just 10secs to more than a minute, per sentence.

2019-08-15 12:49:49.720246,4162
2019-08-15 12:50:00.720310,4163
2019-08-15 12:50:11.065400,4164
2019-08-15 12:50:21.630609,4165
2019-08-15 12:50:32.572490,4166
2019-08-15 12:50:47.027083,4167
2019-08-15 12:50:58.078473,4168
2019-08-15 12:51:09.834870,4169
2019-08-15 12:51:21.490914,4170
2019-08-15 12:51:34.091284,4171
2019-08-15 12:51:48.238152,4172
2019-08-15 12:52:01.631092,4173
2019-08-15 12:52:14.451645,4174
2019-08-15 12:52:27.794607,4175
2019-08-15 12:52:43.495325,4176
.....
2019-08-15 15:23:28.228918,4391
2019-08-15 15:24:39.403824,4392
2019-08-15 15:25:48.217059,4393
2019-08-15 15:26:59.058952,4394
2019-08-15 15:28:09.956804,4395
2019-08-15 15:29:21.806861,4396
2019-08-15 15:30:30.500894,4397
2019-08-15 15:31:41.235117,4398
2019-08-15 15:32:49.256143,4399
@minimaxir
Copy link
Owner

minimaxir commented Aug 18, 2019

This may be due to the memory leak issues which pollute the graph. In which case, use a technique I use in the Cloud Run apps to reset it:

tf.reset_default_graph()
sess.close()
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess)

@ziqizhang
Copy link
Author

ziqizhang commented Aug 19, 2019

Thank you.

I suppose you mean to put this inside the loop to reset the graph periodically. E.g., every 100 iterations. Is that correct?

I have tried this for 2 hours and and the problem seems to have gone. So I guess that is indeed the cause. Would be great to have this fixed in the future release if possible. However for now I am happy with this temporary fix.

Thanks again

@greatblueheron
Copy link

Related: would be great if prefix could be entered as a list, and the elements of the list were processed and returned in parallel -- like batch, but with different prefixes. The time generate is taking seems way too long.

@RandomStrangerOnTheInternet

See #140

@ramanshrivastava
Copy link

I am facing issues with generate time too..since in production expectation is that it should be < 100ms. Do you think Tensorflow serving can help here? And how would encoding work during inference with Tensorflow Serving?

@only-yao
Copy link

I am facing issues with generate time too..since in production expectation is that it should be < 100ms. Do you think Tensorflow serving can help here? And how would encoding work during inference with Tensorflow Serving?

Have you solved the speed problem you encountered in TF serving

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants