Log4Shell scanner memory utilization #368

hrez · 2021-12-19T20:49:43Z

Log4Shell scanner consumes a lot of memory. I saw it as high as 18Gb of resident use.
It appears to depend on a number of .jar's it processes. Does it not release memory after each archive?

Measured by /usr/bin/time -v ./log4shell s /
Maximum resident set size (kbytes): 17572356

Some systems just don't have that much memory available which leads to OOM.

linux, v1.4.1-log4shell

freeqaz · 2021-12-19T23:44:01Z

Awesome, thanks for this report! I'll check the code and see if there is anything obvious. If not, @breadchris should be able to look at this soon (who has better Go-fu than me).

breadchris · 2021-12-20T22:03:28Z

@hrez thanks for reporting this! I have been considering how to optimize this, since the current implementation is very naive to just get something out the door for people to use.

I have a couple ideas for optimization:

For containers, we can scan only running processes for their open files and only scan the jars which are being used by the JVM processes
Using go channels we can introduce async scanning which will considerably boost performance.
Introduce some configurable scanning limits to prevent the OOM

Stay tuned for some updates :)

hrez · 2021-12-21T02:06:40Z

Are containers somehow worse in scanning memory utilization?
Without addressing the memory utilization of zip and zip in zip scanning, running parallel scanning will only increase memory utilization. I've actually done it and that's the case.
The goal is to scan everything or most everything, doing selective scanning risks missing vulnerable packages.
I think those idea are not addressing the core issue.

breadchris · 2021-12-21T02:22:14Z

@hrez When I was writing about the optimizations, I was speaking more generally about optimization and how a second pass of this scanner with a focus on performance will shake out any issues such as the one you are seeing (ie. with tests and benchmarking we can find issues like these). I understand how this can be misleading when the topic of this issue is about too much memory.

I agree that selective scanning is risky, and I am not suggesting this. With the container scanning proposal, this would provide people the option to very quickly scan jars that are loaded into memory which contain the vulnerable library version. A number of companies I have talked to have implemented this specifically for containerized environments.

I think I might have identified the problem though. There is a place in code where I open a file but never close it. I am going to release a patch real quick for this.

hrez · 2021-12-21T02:29:36Z

Looking forward for the patch.
Not closing was my first guess and I went looking for it but unfortunately didn't find any.

breadchris · 2021-12-21T02:44:28Z

https://github.com/lunasec-io/lunasec/releases/tag/v1.4.2-log4shell
Let me know if you can reproduce those memory issues.

breadchris · 2021-12-21T02:45:43Z

I need to improve the tests to include some large folder so that I can monitor memory usage.

If I had to guess though, that file not being closed is probably the issue.

hrez · 2021-12-21T18:42:48Z

That didn't do anything to mem usage.
I tracked it down to processing huge .jar's like bigger than 1Gb.
It's basically in memory decompression, with embeded decompression following.
I think this is just how GC works. It's also not super quick to release back to OS.
Not sure what's the solution here other than decompressing to a disk.

breadchris · 2021-12-21T19:15:10Z

hmm, that is tricky indeed.

I could introduce a new branch when decompressing which will uncompress to disk for large enough zips.

Thank you for identifying the problem, I should have no problem replicating this locally and benchmarking performance.

breadchris mentioned this issue Dec 21, 2021

close reader which is left open in log4shell cli #378

Merged

breadchris added a commit that referenced this issue Jan 5, 2022

Fixes #368 - jars larger than a gig are extracted to disk when scanning

cd03ce7

breadchris closed this as completed in f2ce957 Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log4Shell scanner memory utilization #368

Log4Shell scanner memory utilization #368

hrez commented Dec 19, 2021

freeqaz commented Dec 19, 2021

breadchris commented Dec 20, 2021

hrez commented Dec 21, 2021

breadchris commented Dec 21, 2021

hrez commented Dec 21, 2021

breadchris commented Dec 21, 2021

breadchris commented Dec 21, 2021

hrez commented Dec 21, 2021

breadchris commented Dec 21, 2021

Log4Shell scanner memory utilization #368

Log4Shell scanner memory utilization #368

Comments

hrez commented Dec 19, 2021

freeqaz commented Dec 19, 2021

breadchris commented Dec 20, 2021

hrez commented Dec 21, 2021

breadchris commented Dec 21, 2021

hrez commented Dec 21, 2021

breadchris commented Dec 21, 2021

breadchris commented Dec 21, 2021

hrez commented Dec 21, 2021

breadchris commented Dec 21, 2021