Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SPDX format support for SBOM #608

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

midnightercz
Copy link

Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms.
No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started.

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

@brunoapimentel
Copy link
Contributor

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

@midnightercz
Copy link
Author

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

I thought I will eventually integrate the changes into new subcommand implemented here: #593 but it wasn't merged yet when I started working on this one

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@eskultety eskultety changed the title [STONEBLD-2714] Added SDPX format support for SBOM Add SDPX format support for SBOM Aug 27, 2024
@MartinBasti
Copy link
Contributor

typo in commit message and PR: s/SDPX/SPDX/

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@midnightercz midnightercz changed the title Add SDPX format support for SBOM Add SPDX format support for SBOM Aug 29, 2024
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
@midnightercz
Copy link
Author

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, a couple of minor changes is needed. Edit: you would also need to make sure UTs pass.

cachi2/core/models/sbom.py Outdated Show resolved Hide resolved
tests/unit/test_cli.py Show resolved Hide resolved
tests/unit/test_cli.py Outdated Show resolved Hide resolved
@a-ovchinnikov
Copy link
Collaborator

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

merge-sboms is a thin wrapper around an internal function that is tested elsewhere. It does not do much except some validation of input and then invoking a method for saving merge results (which is also tested elsewhere). There is not much value in doing this sort of test.

@a-ovchinnikov
Copy link
Collaborator

Pushing the change to collect early round of feedback on rework. There is still some work to do, mainly to add more tests and to restructure the history to make it more manageable.

@chmeliik
Copy link
Contributor

chmeliik commented Dec 17, 2024

Tried this out on https://github.com/cachito-testing/gomod-pandemonium

cachi2 fetch-deps --sbom-output-type spdx '[
        {"type": "gomod"},
        {"type": "gomod", "path": "terminaltor"},
        {"type": "gomod", "path": "weird"}
    ]'

Noticed a problem:

Multiple purls for one package

There are 15 packages with 2 purls (and curiously, none with more than 2)

jq < cachi2-output/bom.json '.packages | map(select(.externalRefs | length >= 2))'

Some interesting ones:

crypto/rand vs. math/rand - ⚠️ definitely not the same package!
  {
    "SPDXID": "SPDXRef-Package-crypto/rand-None-8034fbc7583154cf2c496e52e0e50888e58253e533efd316eb8f586b4e151352",
    "name": "crypto/rand",
    "externalRefs": [
      {
        "referenceLocator": "pkg:golang/crypto/rand?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      },
      {
        "referenceLocator": "pkg:golang/math/rand?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      }
    ],
    "annotations": [
      {
        "annotator": "Tool:cachi2:jsonencoded",
        "annotationDate": "2024-12-17T13:06:33Z",
        "annotationType": "OTHER",
        "comment": "{\"name\": \"cachi2:found_by\", \"value\": \"cachi2\"}"
      }
    ],
    "downloadLocation": "NOASSERTION"
  }
crypto/internal/alias vs. vendor/golang.org/x/crypto/internal/alias

Arguably a similar package? Probably best not to group them though

  {
    "SPDXID": "SPDXRef-Package-crypto/internal/alias-None-b72e17ed348a15d861a8d6a517507932ed428364d043e0848384e16f50b6dbe1",
    "name": "crypto/internal/alias",
    "externalRefs": [
      {
        "referenceLocator": "pkg:golang/crypto/internal/alias?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      },
      {
        "referenceLocator": "pkg:golang/vendor/golang.org/x/crypto/internal/alias?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      }
    ],
    "annotations": [
      {
        "annotator": "Tool:cachi2:jsonencoded",
        "annotationDate": "2024-12-17T13:06:33Z",
        "annotationType": "OTHER",
        "comment": "{\"name\": \"cachi2:found_by\", \"value\": \"cachi2\"}"
      }
    ],
    "downloadLocation": "NOASSERTION"
  }
?type=module vs. ?type=package - probably fine to group these (but not needed IMO)
  {
    "SPDXID": "SPDXRef-Package-github.com/Masterminds/semver-v1.4.2-1160d2af59246b7be80ace7615339f4f826b6de1586871b2e4e6eb2def67585b",
    "name": "github.com/Masterminds/semver",
    "versionInfo": "v1.4.2",
    "externalRefs": [
      {
        "referenceLocator": "pkg:golang/github.com/Masterminds/semver@v1.4.2?type=module",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      },
      {
        "referenceLocator": "pkg:golang/github.com/Masterminds/semver@v1.4.2?type=package",
        "referenceType": "purl",
        "referenceCategory": "PACKAGE-MANAGER"
      }
    ],
    "annotations": [
      {
        "annotator": "Tool:cachi2:jsonencoded",
        "annotationDate": "2024-12-17T13:06:33Z",
        "annotationType": "OTHER",
        "comment": "{\"name\": \"cachi2:found_by\", \"value\": \"cachi2\"}"
      }
    ],
    "downloadLocation": "NOASSERTION"
  }

@chmeliik
Copy link
Contributor

Then I also tried out SBOM merging on https://github.com/cachito-testing/pip-e2e-test

cachi2 fetch-deps --sbom-output-type spdx pip
cd ..
# the two repos are next to each other
cachi2 merge-sboms --sbom-output-type spdx gomod-pandemonium/cachi2-output/bom.json pip-e2e-test/cachi2-output/bom.json > merged.bom.json

First minor observation - the output SBOM type does not default to the input SBOM type, which feels weird

Validity problem - the SPDXRef-DocumentRoot-File- element disappears from the merged SBOM.

The individual SBOMs do have it, and the merged SBOM still references it in relationships:

    {
      "spdxElementId": "SPDXRef-DOCUMENT",
      "comment": "",
      "relatedSpdxElement": "SPDXRef-DocumentRoot-File-",
      "relationshipType": "DESCRIBES"
    },
    {
      "spdxElementId": "SPDXRef-DocumentRoot-File-",
      "comment": "",
      "relatedSpdxElement": "SPDXRef-Package-vendor/golang.org/x/sys/cpu-None-fff45ff8a23684e4ba82a21914e5418abe0d79156f3d0d8f33925961edde6d19",
      "relationshipType": "CONTAINS"
    }

But there's no such SPDXID in the .packages array

@chmeliik
Copy link
Contributor

For reference, how I found out about the issues: I'm currently working on the script that merges cachi2 SBOMs with syft SBOMs, and the cachi2 SBOM failed this check chmeliik/build-tasks-dockerfiles@d4ee394

@chmeliik
Copy link
Contributor

Another annoying thing: cachi2 merge-sboms --sbom-output-type spdx randomizes the relationships order

# use any non-empty SBOM
cp cachi2-output/bom.json bom1.json
cp cachi2-output/bom.json bom2.json
cachi2 merge-sboms --sbom-output-type spdx bom1.json bom2.json > merged1.json
cachi2 merge-sboms --sbom-output-type spdx bom1.json bom2.json > merged2.json

diff merged1.json merged2.json
# created date is different, which is fine, and relationships order doesn't match, which is annoying

@a-ovchinnikov
Copy link
Collaborator

There are 15 packages with 2 purls (and curiously, none with more than 2)

Turns out the original implementation ignored Optional[namespace] element of purls when deduplicating packages. Taking it into account resolved the first two issues. I believe it is safe to keep type=module and type=package lumped together, at least for now. As for none with more than 2 part I believe it is due to the structure of the repo: if it somehow had more variety in purl types then I would expect to see more purls merged into a single package.

the output SBOM type does not default to the input SBOM type

The current default is still CycloneDX, so it is assumed that if you don't say otherwise you expect to get CyDX as output. This could be changed to SPDX, but I am not sure trying to guess correct output type basing on input type is a good idea: what should cachi2 do when it is asked to merge one CyDX and one SPDX?

@chmeliik
Copy link
Contributor

The current default is still CycloneDX, so it is assumed that if you don't say otherwise you expect to get CyDX as output. This could be changed to SPDX, but I am not sure trying to guess correct output type basing on input type is a good idea: what should cachi2 do when it is asked to merge one CyDX and one SPDX?

Ah, I didn't know cachi2 supports that. The behaviour could be:

  • all inputs are cyclonex => output cyclonedx
  • all inputs are spdx => output spdx
  • inputs are mixed => require explicit output-type param

But defaulting to CycloneDX is fine too

@a-ovchinnikov
Copy link
Collaborator

the SPDXRef-DocumentRoot-File- element disappears from the merged SBOM ...
cachi2 merge-sboms --sbom-output-type spdx randomizes the relationships order ...

Good catch, thank you. Fixed.

The behaviour could be: ...

Yes, it definitely could be, but I am more concerned about user perspective -- having a known default and a way to override it feels more straightforward than having some logic which sometimes works by itself and sometimes requires assistance. I think I'll keep the current behavior for now and will change it if there is more popular demand for that.

"""Validate that SPDXPackage includes only one purl with the same type, name, version."""
purls = [ref.referenceLocator for ref in refs if ref.referenceType == "purl"]
parsed_purls = [PackageURL.from_string(purl) for purl in purls if purl]
unique_purls_parts = set([(p.type, p.name, p.version) for p in parsed_purls])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if treating purls as equal (or "similar enough") just by type, name, version (or type, namespace, name, version) is a good idea.

Cachi2 takes care to report "non-standard" dependencies as such by adding qualifiers and/or subpaths.

pkg:npm/my-package@1.0.0
pkg:npm/my-package@1.0.0?vcs_url=git+https://github.com/some-org/my-package@deadbeef#my-package

^ the first is a dependency from registry.npmjs.org, the second is from a subdirectory of a github repo. There's no guarantee that the two are related by anything other than coincidence

What are the intended semantics of having one package with multiple purls? Is it that you can pick any one of them arbitrarily and it will represent the same package? If so, that's currently not true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code at L394 appears to be validating a single package's list of purls. Having two unrelated purls that happen to share type, name and version within one package looks like a bug (and is reported as a run time issue on L396). Note, that this code does not merge individual packages, that happens further down around L503. The docstring says:

If package with same name and version is found multiple times in the list,
merge external references of all the packages into one package.

This could result in merging of two unrelated packages together (like in your example).

There's no guarantee that the two are related by anything other
What are the intended semantics than coincidence...?

My understanding of the semantics is that a list of purls represents all versions of a certain package (distinguishable by type, name, version tuple) that were used to generate certain artifact and either are present within it, or were present at some point. In the case when there are multiple unrelated (i.e. different), but identically named and versioned packages it would be currently impossible to distinguish between them and picking one at random would be a mistake. Is this a normal and expected situation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case when there are multiple unrelated (i.e. different), but identically named and versioned packages it would be currently impossible to distinguish between them and picking one at random would be a mistake. Is this a normal and expected situation?

Reposting from Slack:

I can't give you any stats, but some use cases are:

  • You're not building a library to be published to a package registry, you're building an app. There are probably hundreds of npm packages called backend or frontend at version 1.0.0
    • It is unlikely that you would see two or more of those in a single repo but cachi2 can also merge the SBOMs for multiple repos
  • Depending on a fork of a library instead of depending on the library directly. The original library and the fork will usually have the same name (except for Go, where the repo url kind of is the name).
    • This case is of course not "completely unrelated". But it's important to distinguish between the original and the fork
    • For example, Cargo even has a built-in mechanism for this

midnightercz and others added 4 commits December 24, 2024 21:44
Added SDPX format support for SBOM

Support for SPDX format was added to fetch-depds command and also
to merge_syft_sboms.
No changes were made in particular package manager generating components
which are then converted to cyclonedx format. SPDX sbom can be obtained
by calling Sbom.to_spdx().
New switch sbom-type was added to merge_syfy_sboms, so user can choose
which output format should be generated - default is cyclonedx.
Once all tooling is ready to consume spdx sboms, cutoff changes
in this repository can be started.

SPDXRef-DocumentRoot-File- includes all spdx packages and is set
to be described by SPDXRef-DOCUMENT. This way of spdx generation
is closer to way syft generates spdx

Included fixes from sbom validation tool

- Added required documentNamespace attribute
- Added creationInfo.created attribute
- changed SPDXID-Package-* to SPDXRef-Package
- changed annotationDate to not include ms
- changed annotator to include prefix Tool:

Co-authered-by: Alexey Ovchinnikov <aovchinn@redhat.com>
Signed-off-by: Jindrich Luza <jluza@redhat.com>
Having merge_outputs in global utils can result in import loops
in rather unexpected places. Moving it to a different scope to
prevent those from appearing.

Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
This commit builds on top of proposed SPDX support and
addresses the following:
 * simplifies merge logic to align it with design in
   konflux-ci/architecture#213;
 * adjusts the model to make it work with Syft generated
   SPDX Sboms;
 * adds tests for some of existing and some new SPDX-CyDX
   handling and merging cases.

This is not the final form and more work still has to be done:
 * several important test cases are pending;
 * CyDX SBOM has to be amended to better fit into the new
   structure;
 * Decisions need to be additionally documented within the code;
 * This commit and its parent must be split into several
   parts to simplify repository history.

Co-authored-by: Jindrich Luza <midnightercz@gmail.com>
Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>
Adding test conditions and simplifying overall
code structure.

Signed-off-by: Alexey Ovchinnikov <aovchinn@redhat.com>

# Main function body.
cydxfiles, libraries = partition_by(lambda c: c.type == "library", self.components)
# mypy is upset by patrition_by being broadly typed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/patrition_by/partition_by

"""

def create_document_root() -> SPDXPackage:
return SPDXPackage(name="", versionInfo="", SPDXID="SPDXRef-DocumentRoot-File-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this element required by the spec? I tried to find references to it, but couldn't. I was also wondering why does it end with a hyphen?

# mypy is upset by patrition_by being broadly typed.
packages = [create_document_root()] + libs_to_packages(libraries) # type: ignore
files = files_to_packages(cydxfiles) # type: ignore
relationships = [create_root_relationship()] + link_to_root(packages) + link_to_root(files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the spec does not require these relationships to be informed (I also checked the official examples to try to figure this out).

Now, IIUC, we're saying that every dependency is contained by this "virtual" root package SPDXRef-DocumentRoot-File-. I'm wondering if this is adding any value in the end, as it does add a lot of extra lines to the SBOM.

)


class SPDXPackageExternalRefReferenceLocatorURI(pydantic.BaseModel):
Copy link
Contributor

@brunoapimentel brunoapimentel Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Maybe we could move these model classes to a separate sbom-spdx.py module, as this one here has already grown too large. (we don't need to cover this in this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants