Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak with version 0.3.0 #251

Closed
stevenlyde opened this issue Mar 23, 2021 · 10 comments · Fixed by #253
Closed

Possible memory leak with version 0.3.0 #251

stevenlyde opened this issue Mar 23, 2021 · 10 comments · Fixed by #253

Comments

@stevenlyde
Copy link

When upgrading from version 0.2.0 to 0.3.0 we are seeing a possible memory leak. I have attached a small sample project that demonstrates the issue. It has a unit test that has an infinite loop but which should be properly closing the resources it uses. If you run mvn clean install and observe the memory usage of the process, you should observe that the process continues to consume more and more memory. We did not observe this behavior with this code using version 0.2.0.
example.zip

@rnett
Copy link
Contributor

rnett commented Mar 24, 2021

Any non standard JVM args (i.e. javaccp.noPointerGc)? Because at a glance what you have seems to be correct.

Unrelated, but if this is actual code, you should probably use a placeholder for the image string and persist the session, just using a new runner for each new image (and feeding it to the runner).

@Craigacp
Copy link
Collaborator

What platforms are you seeing this behaviour on? CPU/GPU, OS. Do you have an idea roughly how much memory it's losing per iteration? Is it image sized, or something smaller?

Looking through the code sample I agree that shouldn't be leaking. Is it the Java heap that's increasing or the native off-heap memory?

@karllessard
Copy link
Collaborator

Thanks a lot for your example @stevenlyde , changes were applied in 0.3.0 in an attempt to solve other leaks that were observed by some users in 0.2.0, it's interesting that in your case the previous version was working fine.

Since I'm already looking at the other issue, I'll take a look at yours while I'm at it, chances are that they are related.

@stevenlyde
Copy link
Author

Any non standard JVM args (i.e. javaccp.noPointerGc)? Because at a glance what you have seems to be correct.

I am not specifying any JVM args.

What platforms are you seeing this behaviour on? CPU/GPU, OS. Do you have an idea roughly how much memory it's losing per iteration? Is it image sized, or something smaller?

I have observed the issue on CPU running locally on a mac. We also observed the behavior on our servers which are running on CPU and with Amazon Linux.

Is it the Java heap that's increasing or the native off-heap memory?

It is native off-heap memory. Using VisualVm I see that the heap is not growing, but using top I can see the process memory increasing.

@Craigacp
Copy link
Collaborator

Thanks for the information, we'll try and run it down further.

@stevenlyde
Copy link
Author

Do you have an idea roughly how much memory it's losing per iteration? Is it image sized, or something smaller?

I am not sure how much is being leaked each iteration, but you can see several hundred megabytes leaked after about 10 seconds of running the test.

Thank you so much for all of your support! You guys are awesome!

@rnett
Copy link
Contributor

rnett commented Mar 24, 2021

Ok, I've reproduced it and narrowed the issue down to the tensor constant creation.

while (true) {
    val value = TUint8.vectorOf(*ImageTransformerTest.IMAGE_BYTES)
    value.close()
}

This is my baseline, it works fine.

while (true) {
    val value = TString.tensorOf(StandardCharsets.ISO_8859_1, NdArrays.scalarOfObject(String(ImageTransformerTest.IMAGE_BYTES, StandardCharsets.ISO_8859_1)))
    value.close()
}

This (copied from Constants.scalarOf(Scope, Charset, String)) fails.

It can be reduced down to:

while (true) {
    val value = TString.scalarOf(String(ImageTransformerTest.IMAGE_BYTES))
    value.close()
}

I can eliminate the NdArrays.scalarOfObject call, so it's definitely TString.tensorOf.

@saudet
Copy link
Contributor

saudet commented Mar 24, 2021

This looks like a leak in TF Core somewhere. There's no output from the command below, so there's no resources registered with JavaCPP that are not being closed:

mvn clean test -Dorg.bytedeco.javacpp.logger.debug -DargLine=-Xmx200m 2>&1 | grep Collecting | grep -v 'ownerAddress=0x0'

If it's not a leak in TF Core, then we're getting objects somewhere from TF Core for which deallocators have not been registered with JavaCPP...

I can eliminate the NdArrays.scalarOfObject call, so it's definitely TString.tensorOf.

Hum, sounds like something with the new code for string tensors...

@saudet
Copy link
Contributor

saudet commented Mar 24, 2021

Yeah, it looks like we need to deallocate explicitly the TF_TString pointers allocated with TF_TString_Init() in here:
https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/buffer/ByteSequenceTensorBuffer.java
I'll create an Abstract_TF_TString or something to register deallocators with JavaCPP, but ideally we need to make ByteSequenceTensorBuffer AutoCloseable and figure out how to wire that with Tensor.close() @karllessard

@karllessard
Copy link
Collaborator

Version 0.3.1 has been released this morning to fix this issue. I'm now closing it but please reopen if you are still facing any problem. Thanks for reporting it @stevenlyde, and for fixing it @saudet .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants