-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential memory leak on reloading model #72
Comments
Hi @jriegner , thank you for this report! You're right, there's a leak. And I guess I found where the leak is 😄
When the model is deleted, the session is not automatically closed (it's the old I guess we have 2 options:
I kinda like more the second approach. Right now I'm travelling so I don't know when I'll be able to work on it (I'm setting up the development environment right now while I'm on a train lol) - if you want to implement the second option, I'll be more then happy to review and merge it |
Update: I guess I made it b536202 Give it a try and let me know if it works |
Hey, thanks for your fast reply! I ran my local test app with the updated code and I will check with our production setup too and will come back then. Edit: |
If it's not the leak related to the missing session close, I have no other clue 🤔 For sure there was a leak caused by the missing session.close(), which now should be fixed (I forgot to add the same line in the But I have no idea of what could cause this issue on tfgo side - maybe it's a leak in the TensorFlow C library |
Alright, I will update the version on my side and give it a try. Thanks for your support, maybe the last update fixed it 👍🏻 |
I try it, but is not work! |
Hi @LoveVsLike - I guess the problem is inside the TensorFlow bindings. As you can see, in tfgo we just open the session and using a finalizer we close the session when the model is collected. So, the problem in the TensorFlow code, I guess. Perhaps you can try to open an issue on https://github.com/tensorflow/tensorflow and link this thread there. Maybe, someone from the TensorFlow team can help us. Anyway, since tfgo is still using TensorFlow 2.9.1 I can try to update my fork to 2.14. Maybe the leak is already fixed and we don't know it (but I'm not confident, since it has been years and this leak is still present version after version...). I'll update the fork and I'll let you know. |
Hi, I update TF to last version and tfgo, But is not work, Can you put a issue to tf? |
Hey guys,
We use
tfgo
and notice an increase of memory usage each time our model gets reloaded. We have a running service which periodically checks whether the model got updated and reloads it. Now I wouldn't expect the memory usage to increase, since the model in memory should be replaced by the updated one.The code to load the model is
But our monitoring shows that the usage goes up every time the model gets reloaded (once per hour). I profiled the service with
pprof
and could not see that any of the internal components in our code has a significantly growing memory usage.Furthermore I built tensorflow 2.9.1 with debug symbols and wrote a small go app just reloading the model. I did this to check for memory leaks with
memleak-bpfcc
from https://github.com/iovisor/bcc. This gave me the following stack trace, which, I believe, shows that there is memory leakedAs you can see this stacktrace shows calls to
tfgo
and to the underlying tensorflow library. I am not sure if I read it right, but it seems like there is a leak intfgo
or tensorflow itself.Is there a way to explicitly release the memory of a loaded model when we reload? Could it be a problem in
tfgo
?If you need more information on this, please tell me.
Thanks in advance :)
The text was updated successfully, but these errors were encountered: