-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scanning a project with many DLLs is slow #3455
Comments
An initial glance shows that what's taking time is the
Now the time to read that many DLLs make sense, however, looking at the raw syft results I think a few things need to be verified:
I see a lot of packages (46,897!) and a lot of them have multiple duplicates (some in the hundreds!). We try to keep distinct project dependency graphs, so it might be possible that this is correct, but I think it would be worth double checking this result too. Back to the topic at hand: performance... I don't see how we can scan that many DLLs in only seconds, so that doesn't seem like the right answer here. The
And when we ignore DLLs (thus look purely at the deps.json) we see much better performance:
As a workaround @KeylinxTobias can you try out running syft with Side note: I wonder if we should have a |
There is a request for a csproj cataloger: #1522 |
is this issue happens on windows OS? |
@TimBrown1611 I don't believe this is a windows specific issue, no. GIven @wagoodman ran |
hi @popey ! |
For discussion: This sounds like a user is scanning a source directory with build artifacts and we don't want to include the build artifacts in the scan since we already have the build description files providing the information about what packages we should report. This is similar to a Maven project, where you have a This could also be related to how we use different catalogers if we're scanning image sources vs directories -- this may be somehow per-directory tree differences. |
I have observed this behavior also on an offline build server. It could be traced back to repeated attempts to download certificate revocation lists for code signing certificates. This of course happens only when there are already signed binaries in the scanned stuff. |
Hey @rsphilk -- your last comment mentions that you traced down slowness to downloading certificate revocation lists. Are you saying that Syft is doing this when you scan something with it? There is also a PR that improves performance when scanning DLLs, with initial testing showing that it may be twice as fast with significantly less memory allocation: #3563, this may help the situation for you, too. |
Hi @kzantow
Yes, that is what I observed. |
That's a really interesting observation! Did you happen to have an example repo or other artifact we could use to reproduce this behavior on an offline Windows VM/machine? |
do you have any eta when the PR will be merged? we can't upgrade syft due to the performance :( |
The PR limits allocations, but not total unreclaimable memory, as far as I understand it. I don't think it is going to solve the other issues that you have raised; I'm not sure how upgrading would be blocked on that PR. |
I am using syft 1.16.0, I can't point which comment \ change was made, but scanning windows machine became much longer (because of the mentioned cataloger). |
@TimBrown1611 -- if I look at the commits from 1.16.0 to 1.19.0, I am not seeing anything that directly affected the Dotnet Portable Executable cataloger, which is the one scanning DLLs. There was a |
hi! I don't have an image, but I've scanned AMI |
@rsphilk / @TimBrown1611 is it possible the network connections and delays are due to Microsoft Windows SmartScreen? I am not a Windows expert, but I understand it has the capability to fingerprint binaries (programs and libraries), then submit with an online Microsoft database to check them for known issues. If you're scanning an entire filesystem, I imagine this will cause Syft to open every binary, and potentially trigger SmartScreen. Is that something worth considering? |
hi @popey I do see big gaps between the performances of the different syft's |
We are having our build environment disconnected from Internet for security reasons and want to be able to produce an non enriched SBOM using Syft. However when running Syft, the scanning is very slow. Using the verbose logging and we can track that it is some dll's that take a lot of time. We have checked the configuration options and tried to disable as much as possible but it still does not go well. It seems that when scanning each file, Syft tries to go out on Internet and check for something and then times out? A common solution for us can sometimes take about an hour to produce the SBOM. When using other scanners like Trivy, we are down to seconds.
We are mainly using nuget packages in our solution and are using windows servers for our build environment.
How to reproduce:
Thanks in advance.
The text was updated successfully, but these errors were encountered: