-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supplement generated SBOMs with layer information post-build #3258
Conversation
To save space would it make sense to add this comment to the whole package instead of every file. Is there a use case where knowing location of every file is needed? 99+% of the cases this value would be the same for all files inside a package. |
if err != nil { | ||
return err | ||
} | ||
f.FileComment = fmt.Sprintf("layerID: %s", desc.Digest.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should we do if f.FileComment
is already commented?
I'm not sure if there's a standard for composing comments together, so maybe this isn't the right sort of place to put this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have settled on not overwriting the FileComment
and simply skipping over this file if there is already a comment - given that there's no way to safely detect what to do in this scenario. If some sort of standard emerges, we can try and follow that, but as of now, it doesn't seem like there is one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wagoodman, should we use a different key here instead of layerID
? Ideally we wouldn't want to interfere with an existing tooling that expects the value here to be a ChainID, so it might make sense to pick something different, like maybe just layer
?
7bfc294
to
4e072a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "draft" only because it is based on local attestations or smth missing?
c.pending = c.pending[:len(c.pending)-1] | ||
|
||
found := false | ||
for _, f := range files { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't handle whiteouts or opaque directories. Not a blocker for first PR but we need to track the missing pieces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure how we should handle whiteouts. Ideally, every file that we lookup exists in the final image - if it does, we should be safe to detect + ignore all whiteouts, skipping over them.
If we want to handle deleted files, we could return the details of the layer that deleted them, which should be just as easy, we'd just write those into the same cache
map.
) | ||
|
||
var intotoPlatform ocispecs.Platform = ocispecs.Platform{ | ||
Architecture: "unknown", | ||
OS: "unknown", | ||
} | ||
|
||
// supplementSBOM modifies SPDX attestations to include the file layers | ||
func supplementSBOM(ctx context.Context, s session.Group, target cache.ImmutableRef, targetRemote *solver.Remote, refs map[string]cache.ImmutableRef, att result.Attestation) (result.Attestation, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would be better with 2 functions in series. supplementSBOM()
should just take []byte
as an input.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Include a new ReadAll primitive, and rename Generate to MakeInTotoAttestations. Signed-off-by: Justin Chadwell <me@jedevc.com>
4e072a4
to
31295c8
Compare
Signed-off-by: Justin Chadwell <me@jedevc.com>
31295c8
to
733b5b1
Compare
⬆️ Follow-up to #2983.
📓 Because of interface changes, this PR depends on #3197.
BuildKit can generate SBOMs for images that it builds using scanner images: e.g. jedevc/buildkit-syft-scanner - these SBOMs are attached as image attestations to the final image. Any component in BuildKit that creates and modifies the build result, starting with the frontend, to the solver, or even the final exporter, can attach attestations, allowing flexibility such as #3249 (where we consume Dockerfile args to determine exactly what inputs to pass to the scanner). This flexibility will allow custom SBOMs to be generated by custom frontends in the future, if we go down that route.
However, this flexibility presents a challenge - we cannot provide layer identifiers to the scanner at the time of scanning. There are several reasons for this:
Layer identifiers only become known in the final step of image exporting, so we have to wait until then to supplement our SBOM with these identifiers. Modifying the SPDX generated by the SBOM scanner is not ideal - however, because of the above reasons, there's no way to provide this data to the scanner at time-of-scan (at least without a fundamental re-architecture of buildkit's frontend⟹backend⟹exporter which would likely break caching in major ways).
For now, these layer identifiers are added to the comment field of the
files
map, a la syft, e.g.:There is one major difference between this and the output produced by syft - we instead use digests from the descriptor, which is the digest of the layer tarball in the registry/content store instead of the chain id, which can be fiddly to use back-reference content when scanning items in the registry.
This transformation of the SBOM should be invisible to users - it should happen automatically. We should provide controls to disable this functionality for the user (and maybe the scanner as well?) in the case that this information isn't desired.