-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duplicate file in tar archive causes read to fail #1400
Comments
Interesting problem @deitch! The one issue I see is that Syft operates on |
That is true. And since both the first and second copies of I think that is better than the current "will not scan at all" behaviour. Perhaps the best would be if it could just read the entire tar archive as a stream, rather than extracting it, but I understand that there may be issues with that, like following links? I don't really know.
My instinct would be to give a WARN on duplicates but keep going, thus scanning the latest, with an option to fail-on-duplicates.
I am not sure it fits the expected behaviour. I think the user would want either to take last consistent with tar (default); or error out because of incomplete sbom (option). Anything else doesn't fit with either. |
@deitch good point -- if a user extracts the tar, and the behavior is always that the last entry wins, I'd agree a warning here is probably sufficient as the first duplicate entry would essentially get overwritten. I'll bring this up wit the team today and see if we can get some consensus -- if so, this sounds like a pretty simple change. |
Is there anything I can do to help? |
It looks like this is mostly fixed but not entirely. If the file being replaced is a symlink, then it tries to follow the symlink and replace its target, rather than the link itself. This is an issue in archiver, not in syft per se, so I will open an issue there and link it here. |
See the linked issue. syft still uses archiver/v3, which no longer is supported. v4 doesn't have this issue, but it requires more work on the consumer's part. |
@deitch reopened this to update to |
Hi @kzantow following up on this one. Any success? |
Please provide a set of steps on how to reproduce the issue
$ syft dir:/tmp/syft # or just `syft /tmp/syft`
$ syft file:/tmp/syft.tar ✔ Indexed /tmp/syft.tar ✔ Cataloged packages [0 packages] [0000] WARN file could not be unarchived: reading file in tar archive: file already exists: /tmp/syft-archive-contents-1809756098/abc No packages discovered
What happened:
syft refuses to scan a tar file when there are duplicate entries. Because of tar's sequential structure (it is a tape archive 😁 ), this is legitimate. Further, when untarring a tar file, tar generally extracts later files over previous ones, unless explicitly set to fail on duplicate. However, syft fails outright.
What you expected to happen:
I expected it to process the tar file successfully. At the least, there should be options to fail-on-duplicate or continue-on-duplicate (the default tar behaviour)
Anything else we need to know?:
No.
Environment:
syft version
:(although I tried it with v0.63.0 via docker as well)
cat /etc/os-release
or similar):(although I ran it inside the official syft docker images as well)
The text was updated successfully, but these errors were encountered: