Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log4Shell scanner memory utilization #368

Closed
hrez opened this issue Dec 19, 2021 · 9 comments
Closed

Log4Shell scanner memory utilization #368

hrez opened this issue Dec 19, 2021 · 9 comments

Comments

@hrez
Copy link

hrez commented Dec 19, 2021

Log4Shell scanner consumes a lot of memory. I saw it as high as 18Gb of resident use.
It appears to depend on a number of .jar's it processes. Does it not release memory after each archive?

Measured by /usr/bin/time -v ./log4shell s /
Maximum resident set size (kbytes): 17572356

Some systems just don't have that much memory available which leads to OOM.

linux, v1.4.1-log4shell

@freeqaz
Copy link
Member

freeqaz commented Dec 19, 2021

Awesome, thanks for this report! I'll check the code and see if there is anything obvious. If not, @breadchris should be able to look at this soon (who has better Go-fu than me).

@breadchris
Copy link
Contributor

@hrez thanks for reporting this! I have been considering how to optimize this, since the current implementation is very naive to just get something out the door for people to use.

I have a couple ideas for optimization:

  1. For containers, we can scan only running processes for their open files and only scan the jars which are being used by the JVM processes
  2. Using go channels we can introduce async scanning which will considerably boost performance.
  3. Introduce some configurable scanning limits to prevent the OOM

Stay tuned for some updates :)

@hrez
Copy link
Author

hrez commented Dec 21, 2021

Are containers somehow worse in scanning memory utilization?
Without addressing the memory utilization of zip and zip in zip scanning, running parallel scanning will only increase memory utilization. I've actually done it and that's the case.
The goal is to scan everything or most everything, doing selective scanning risks missing vulnerable packages.
I think those idea are not addressing the core issue.

@breadchris
Copy link
Contributor

@hrez When I was writing about the optimizations, I was speaking more generally about optimization and how a second pass of this scanner with a focus on performance will shake out any issues such as the one you are seeing (ie. with tests and benchmarking we can find issues like these). I understand how this can be misleading when the topic of this issue is about too much memory.

I agree that selective scanning is risky, and I am not suggesting this. With the container scanning proposal, this would provide people the option to very quickly scan jars that are loaded into memory which contain the vulnerable library version. A number of companies I have talked to have implemented this specifically for containerized environments.

I think I might have identified the problem though. There is a place in code where I open a file but never close it. I am going to release a patch real quick for this.

@hrez
Copy link
Author

hrez commented Dec 21, 2021

Looking forward for the patch.
Not closing was my first guess and I went looking for it but unfortunately didn't find any.

@breadchris
Copy link
Contributor

https://github.com/lunasec-io/lunasec/releases/tag/v1.4.2-log4shell
Let me know if you can reproduce those memory issues.

@breadchris
Copy link
Contributor

I need to improve the tests to include some large folder so that I can monitor memory usage.

If I had to guess though, that file not being closed is probably the issue.

@hrez
Copy link
Author

hrez commented Dec 21, 2021

That didn't do anything to mem usage.
I tracked it down to processing huge .jar's like bigger than 1Gb.
It's basically in memory decompression, with embeded decompression following.
I think this is just how GC works. It's also not super quick to release back to OS.
Not sure what's the solution here other than decompressing to a disk.

@breadchris
Copy link
Contributor

hmm, that is tricky indeed.

I could introduce a new branch when decompressing which will uncompress to disk for large enough zips.

Thank you for identifying the problem, I should have no problem replicating this locally and benchmarking performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants