-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SPDX format support for SBOM #608
base: main
Are you sure you want to change the base?
Conversation
I thought I will eventually integrate the changes into new subcommand implemented here: #593 but it wasn't merged yet when I started working on this one |
5e78923
to
e64cec4
Compare
e64cec4
to
4a10a9d
Compare
typo in commit message and PR: s/SDPX/SPDX/ |
e08cfd6
to
0f308c0
Compare
Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, a couple of minor changes is needed. Edit: you would also need to make sure UTs pass.
|
d6b2ecc
to
c536437
Compare
4dcd852
to
16c9f7b
Compare
Pushing the change to collect early round of feedback on rework. There is still some work to do, mainly to add more tests and to restructure the history to make it more manageable. |
Tried this out on https://github.com/cachito-testing/gomod-pandemonium cachi2 fetch-deps --sbom-output-type spdx '[
{"type": "gomod"},
{"type": "gomod", "path": "terminaltor"},
{"type": "gomod", "path": "weird"}
]' Noticed a problem: Multiple purls for one packageThere are 15 packages with 2 purls (and curiously, none with more than 2) jq < cachi2-output/bom.json '.packages | map(select(.externalRefs | length >= 2))' Some interesting ones: crypto/rand vs. math/rand -
|
Then I also tried out SBOM merging on https://github.com/cachito-testing/pip-e2e-test cachi2 fetch-deps --sbom-output-type spdx pip
cd ..
# the two repos are next to each other
cachi2 merge-sboms --sbom-output-type spdx gomod-pandemonium/cachi2-output/bom.json pip-e2e-test/cachi2-output/bom.json > merged.bom.json First minor observation - the output SBOM type does not default to the input SBOM type, which feels weird Validity problem - the The individual SBOMs do have it, and the merged SBOM still references it in relationships: {
"spdxElementId": "SPDXRef-DOCUMENT",
"comment": "",
"relatedSpdxElement": "SPDXRef-DocumentRoot-File-",
"relationshipType": "DESCRIBES"
},
{
"spdxElementId": "SPDXRef-DocumentRoot-File-",
"comment": "",
"relatedSpdxElement": "SPDXRef-Package-vendor/golang.org/x/sys/cpu-None-fff45ff8a23684e4ba82a21914e5418abe0d79156f3d0d8f33925961edde6d19",
"relationshipType": "CONTAINS"
} But there's no such SPDXID in the |
For reference, how I found out about the issues: I'm currently working on the script that merges cachi2 SBOMs with syft SBOMs, and the cachi2 SBOM failed this check chmeliik/build-tasks-dockerfiles@d4ee394 |
Another annoying thing: # use any non-empty SBOM
cp cachi2-output/bom.json bom1.json
cp cachi2-output/bom.json bom2.json
cachi2 merge-sboms --sbom-output-type spdx bom1.json bom2.json > merged1.json
cachi2 merge-sboms --sbom-output-type spdx bom1.json bom2.json > merged2.json
diff merged1.json merged2.json
# created date is different, which is fine, and relationships order doesn't match, which is annoying |
Turns out the original implementation ignored
The current default is still CycloneDX, so it is assumed that if you don't say otherwise you expect to get CyDX as output. This could be changed to SPDX, but I am not sure trying to guess correct output type basing on input type is a good idea: what should cachi2 do when it is asked to merge one CyDX and one SPDX? |
Ah, I didn't know cachi2 supports that. The behaviour could be:
But defaulting to CycloneDX is fine too |
16c9f7b
to
da4e1cf
Compare
Good catch, thank you. Fixed.
Yes, it definitely could be, but I am more concerned about user perspective -- having a known default and a way to override it feels more straightforward than having some logic which sometimes works by itself and sometimes requires assistance. I think I'll keep the current behavior for now and will change it if there is more popular demand for that. |
"""Validate that SPDXPackage includes only one purl with the same type, name, version.""" | ||
purls = [ref.referenceLocator for ref in refs if ref.referenceType == "purl"] | ||
parsed_purls = [PackageURL.from_string(purl) for purl in purls if purl] | ||
unique_purls_parts = set([(p.type, p.name, p.version) for p in parsed_purls]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if treating purls as equal (or "similar enough") just by type, name, version
(or type, namespace, name, version
) is a good idea.
Cachi2 takes care to report "non-standard" dependencies as such by adding qualifiers and/or subpaths.
pkg:npm/my-package@1.0.0
pkg:npm/my-package@1.0.0?vcs_url=git+https://github.com/some-org/my-package@deadbeef#my-package
^ the first is a dependency from registry.npmjs.org, the second is from a subdirectory of a github repo. There's no guarantee that the two are related by anything other than coincidence
What are the intended semantics of having one package with multiple purls? Is it that you can pick any one of them arbitrarily and it will represent the same package? If so, that's currently not true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code at L394 appears to be validating a single package's list of purls. Having two unrelated purls that happen to share type, name and version within one package looks like a bug (and is reported as a run time issue on L396). Note, that this code does not merge individual packages, that happens further down around L503. The docstring says:
If package with same name and version is found multiple times in the list,
merge external references of all the packages into one package.
This could result in merging of two unrelated packages together (like in your example).
There's no guarantee that the two are related by anything other
What are the intended semantics than coincidence...?
My understanding of the semantics is that a list of purls represents all versions of a certain package (distinguishable by type, name, version
tuple) that were used to generate certain artifact and either are present within it, or were present at some point. In the case when there are multiple unrelated (i.e. different), but identically named and versioned packages it would be currently impossible to distinguish between them and picking one at random would be a mistake. Is this a normal and expected situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case when there are multiple unrelated (i.e. different), but identically named and versioned packages it would be currently impossible to distinguish between them and picking one at random would be a mistake. Is this a normal and expected situation?
Reposting from Slack:
I can't give you any stats, but some use cases are:
- You're not building a library to be published to a package registry, you're building an app. There are probably hundreds of npm packages called
backend
orfrontend
at version 1.0.0- It is unlikely that you would see two or more of those in a single repo but cachi2 can also merge the SBOMs for multiple repos
- Depending on a fork of a library instead of depending on the library directly. The original library and the fork will usually have the same name (except for Go, where the repo url kind of is the name).
- This case is of course not "completely unrelated". But it's important to distinguish between the original and the fork
- For example, Cargo even has a built-in mechanism for this
63fa229
to
c229a34
Compare
Added SDPX format support for SBOM Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms. No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx(). New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started. SPDXRef-DocumentRoot-File- includes all spdx packages and is set to be described by SPDXRef-DOCUMENT. This way of spdx generation is closer to way syft generates spdx Included fixes from sbom validation tool - Added required documentNamespace attribute - Added creationInfo.created attribute - changed SPDXID-Package-* to SPDXRef-Package - changed annotationDate to not include ms - changed annotator to include prefix Tool: Co-authered-by: Alexey Ovchinnikov <aovchinn@redhat.com> Signed-off-by: Jindrich Luza <jluza@redhat.com>
Having merge_outputs in global utils can result in import loops in rather unexpected places. Moving it to a different scope to prevent those from appearing. Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
This commit builds on top of proposed SPDX support and addresses the following: * simplifies merge logic to align it with design in konflux-ci/architecture#213; * adjusts the model to make it work with Syft generated SPDX Sboms; * adds tests for some of existing and some new SPDX-CyDX handling and merging cases. This is not the final form and more work still has to be done: * several important test cases are pending; * CyDX SBOM has to be amended to better fit into the new structure; * Decisions need to be additionally documented within the code; * This commit and its parent must be split into several parts to simplify repository history. Co-authored-by: Jindrich Luza <midnightercz@gmail.com> Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
Adding test conditions and simplifying overall code structure. Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
c229a34
to
e2daf82
Compare
|
||
# Main function body. | ||
cydxfiles, libraries = partition_by(lambda c: c.type == "library", self.components) | ||
# mypy is upset by patrition_by being broadly typed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/patrition_by/partition_by
""" | ||
|
||
def create_document_root() -> SPDXPackage: | ||
return SPDXPackage(name="", versionInfo="", SPDXID="SPDXRef-DocumentRoot-File-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this element required by the spec? I tried to find references to it, but couldn't. I was also wondering why does it end with a hyphen?
# mypy is upset by patrition_by being broadly typed. | ||
packages = [create_document_root()] + libs_to_packages(libraries) # type: ignore | ||
files = files_to_packages(cydxfiles) # type: ignore | ||
relationships = [create_root_relationship()] + link_to_root(packages) + link_to_root(files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the spec does not require these relationships to be informed (I also checked the official examples to try to figure this out).
Now, IIUC, we're saying that every dependency is contained by this "virtual" root package SPDXRef-DocumentRoot-File-
. I'm wondering if this is adding any value in the end, as it does add a lot of extra lines to the SBOM.
) | ||
|
||
|
||
class SPDXPackageExternalRefReferenceLocatorURI(pydantic.BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Maybe we could move these model classes to a separate sbom-spdx.py
module, as this one here has already grown too large. (we don't need to cover this in this PR)
Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms.
No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started.
Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)