Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM in tests (Part 3) #42471

Closed
mschorsch opened this issue Aug 11, 2024 · 22 comments
Closed

OOM in tests (Part 3) #42471

mschorsch opened this issue Aug 11, 2024 · 22 comments
Assignees
Labels
area/testing kind/bug Something isn't working triage/out-of-date This issue/PR is no longer valid or relevant

Comments

@mschorsch
Copy link
Contributor

mschorsch commented Aug 11, 2024

Describe the bug

As stated in #42355 (comment), we continue to receive an OOM in our tests. I was able to create a reproducer for this.

Seems there is another memory/classloader leak.

Expected behavior

No response

Actual behavior

No response

How to Reproduce?

oom-reproducer3.zip

./gradlew test --console=plain

Output of uname -a or ver

Linux

Output of java -version

Java 21

Quarkus version or git rev

Quarkus 3.13.2

Build tool (ie. output of mvnw --version or gradlew --version)

Gradle 8.9

Additional information

No response

@mschorsch mschorsch added the kind/bug Something isn't working label Aug 11, 2024
@gsmet
Copy link
Member

gsmet commented Aug 11, 2024

That's awesome, thanks. I will have a closer look soon.

@gsmet
Copy link
Member

gsmet commented Aug 11, 2024

I have been chasing them for a while but it all depends on the extensions being around. I hope we'll be able to get to a more sustainable situation thanks to you.

@gsmet gsmet self-assigned this Aug 11, 2024
@gsmet
Copy link
Member

gsmet commented Aug 12, 2024

OK so I can reproduce it and this is a bit different. The issue is mostly related to Testcontainers threads and things are not leaking like crazy as they use to do (at least in my experiments).

Now, our CL objects are very large and they also keep open a gazillion of zip file systems, which explains the problem (I think).

@mschorsch
Copy link
Contributor Author

mschorsch commented Aug 12, 2024

Unfortunately, it is not easy to create a reproducer. In our real project I can easily produce an OOM (without testcontainers) unfortunately I could not extract a simple reproducer so far. I was hoping that this reproducer would at least partially solve the problem.

Overall, one would expect that if each test class is executed individually and sequentially, the memory requirements would remain the same (at least after a full GC). This is not the case here.

@gsmet
Copy link
Member

gsmet commented Aug 12, 2024

I pushed a first PR here ^.

It improves things a bit but doesn’t really fix the reproducer.

@gsmet
Copy link
Member

gsmet commented Aug 12, 2024

Do you think we could maybe organize a Google Meet call together?

If you can get a heap dump and install Eclipse MAT, I could show you how to get some info and we could make progress without you sending me the heap dump?

My guess is that it’s all going to be about class loaders so I won’t see any privileged info on your screen.

Let me know if that could work for you. My email is open at gsmet at redhat dot com.

@mschorsch
Copy link
Contributor Author

Hi @gsmet , sorry for the delay. Unfortunately this is currently not possible as I am on vacation. I will be back in September.

Nevertheless, thank you for your work and the first improvements in #42492 👍

@mschorsch
Copy link
Contributor Author

mschorsch commented Aug 15, 2024

@gsmet A positive message, I have checked your changes from #42492 again against our real application and I can confirm that the changes prevent an OOM from occurring 👍 .

@gsmet
Copy link
Member

gsmet commented Aug 15, 2024

Yeah, it still fails with your reproducer 3 though. There are several other things to fix.

#42525 should help more and then there's the issue of Testcontainers leaking things like crazy that I'm trying to solve.

@gsmet
Copy link
Member

gsmet commented Aug 16, 2024

@mschorsch FWIW, I was able to run your reproducer to completion with all these three applied:

Now, just be aware that they will probably require some discussions and I'm not convinced it will be a good move to push them to 3.14.
Time will tell.

@mschorsch
Copy link
Contributor Author

@gsmet Thanks. Looks promising

@gsmet
Copy link
Member

gsmet commented Aug 17, 2024

Unfortunately the Testcontainers one has little chance to be merged as it comes with some massive problem.

We will have some discussions when @geoand is back from PTO.

@afalhambra-hivemq
Copy link

Hey guys,
we've been having the same problem in our tests, since the @QuarkusTestResource was deprecated, tons of tests ran into an OOM issue.
And as per this comment, setting the restrictToAnnotatedClass = false in the @WithTestResource did work for us as a workaround though.

@gsmet
Copy link
Member

gsmet commented Aug 22, 2024

Yes I'm not surprised.

@gsmet
Copy link
Member

gsmet commented Aug 22, 2024

I tried to make the migration guide a lot more explicit: https://github.com/quarkusio/quarkus/wiki/Migration-Guide-3.13#quarkustestresource-replaced-by-withtestresource-gear-white_check_mark .

@afalhambra-hivemq would the new wording have helped you?

@gsmet
Copy link
Member

gsmet commented Aug 22, 2024

I also created #42700 to add more info to the Javadoc.

@afalhambra-hivemq
Copy link

afalhambra-hivemq commented Aug 22, 2024

I tried to make the migration guide a lot more explicit: https://github.com/quarkusio/quarkus/wiki/Migration-Guide-3.13#quarkustestresource-replaced-by-withtestresource-gear-white_check_mark .

@afalhambra-hivemq would the new wording have helped you?

Yes, definitely. Thanks a bunch. Looking forward to the fix.

@geoand
Copy link
Contributor

geoand commented Aug 26, 2024

Once #42525 is in, we need to recap where we are, ideally with some real world examples

@geoand
Copy link
Contributor

geoand commented Sep 5, 2024

@mschorsch we have changed the behavior of @WithTestResource in main, see #42852 for more details.

Any chance you can give it a spin?

@mschorsch
Copy link
Contributor Author

mschorsch commented Sep 6, 2024

@geoand I tested our application against #42852 with every option from TestResourceScope and it worked without a problem 🚀 .

In addition, I was even able to reduce the memory requirements for the tests, I assume due to #42525.

@geoand
Copy link
Contributor

geoand commented Sep 6, 2024

Awesome, thanks!

@geoand
Copy link
Contributor

geoand commented Sep 24, 2024

Let's close this for now and if similar problems occur in the future, we can open a new issue

@geoand geoand closed this as completed Sep 24, 2024
@geoand geoand added the triage/out-of-date This issue/PR is no longer valid or relevant label Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing kind/bug Something isn't working triage/out-of-date This issue/PR is no longer valid or relevant
Projects
None yet
Development

No branches or pull requests

4 participants