Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include go binary h1 digests in SPDX #1261

Closed
kzantow opened this issue Oct 13, 2022 · 7 comments · Fixed by #1265
Closed

Include go binary h1 digests in SPDX #1261

kzantow opened this issue Oct 13, 2022 · 7 comments · Fixed by #1265
Assignees
Labels
enhancement New feature or request

Comments

@kzantow
Copy link
Contributor

kzantow commented Oct 13, 2022

What would you like to be added:
Syft captures some metadata when cataloging go binaries, including the h1 digests. This information is currently output in Syft and CycloneDX (as a property) but not SPDX. There is a request to include the h1 digest information in SPDX format.

Based on the go mod docs, it appears we can put this information in the package checksum field after translation: h1 -> SHA256 and hex.Encode(base64.Decode(value)).

Why is this needed:
To align data between the formats and provide digest information for SPDX users.

Additional context:
This was originally reported by @deitch and has a little more context here: #546 (comment)

@kzantow kzantow added the enhancement New feature or request label Oct 13, 2022
@deitch
Copy link
Contributor

deitch commented Oct 13, 2022

Thanks @kzantow . I understand the hesitation. I figure there has to be somewhere for that to go. Would it fit within the purl?

@kzantow
Copy link
Contributor Author

kzantow commented Oct 13, 2022

We try to only include information in the PURL that is explicitly defined in the spec (although we do deviate from this out of necessity: to add required information for Grype to function properly).

Looking at the PURL spec again, the qualifiers section is probably what you're referring to, and we'd like to stick to known qualifiers. The good news is it looks like there is a checksum field. The bad news is it's defined as <lowercase_algorithm>:<hex_encoded_lowercase_value> but the h1 values we have aren't in this format, e.g. h1:crn/baboCvb5fXaQ0IJ1SGTsTVrWpDsCWC8EGETZijY=, I don't know offhand how the h1 would be parsed in order to be hex encoded (we could probably figure this out).

The other hesitation about using this information in the PURL is, the spec also says this about qualifiers:

Additional, separate external attributes stored outside of a purl are the preferred mechanism to convey extra long and optional information such as a download URL, vcs URL or checksums in an API, database or web form.

Sorry, I was confusing this a bit with another issue at first... SPDX does have a package checksum. However, this has the same issue as the PURL -- the format is lowercase hex encoded, and also a specific set of algorithms that can be used... if the h1 is actually one of those algorithms, we could probably figure out how to do this translation.

@kzantow
Copy link
Contributor Author

kzantow commented Oct 13, 2022

Aha! According to https://go.dev/ref/mod#go-sum-files

The hash column consists of an algorithm name (like h1) and a base64-encoded cryptographic hash, separated by a colon (:). Currently, SHA-256 (h1) is the only supported hash algorithm. If a vulnerability in SHA-256 is discovered in the future, support will be added for another algorithm (named h2 and so on).

So, this looks like a pretty simple translation: h1 -> SHA256 and hex.Encode(base64.Decode(value)) 👍

Would it be a problem for you if you don't see h1 digests directly in SPDX?

@deitch
Copy link
Contributor

deitch commented Oct 13, 2022

The bad news is it's defined as <lowercase_algorithm>:<hex_encoded_lowercase_value> but the h1 values we have aren't in this format, e.g. h1:crn/baboCvb5fXaQ0IJ1SGTsTVrWpDsCWC8EGETZijY=

According to the go modules reference, h1 indicates sha256, and the hash itself is base64-encoded. So essentially h1:crn/baboCvb5fXaQ0IJ1SGTsTVrWpDsCWC8EGETZijY= is the equivalent of sha256:72b9ff6da6e80af6f97d7690d082754864ec4d5ad6a43b02582f041844d98a36 (just did base64 decoding piped to hex encoding).

Which would be the right format? 🤷‍♂️

Maybe package checksum would be better? h1 = SHA256, and we can convert easily to hex?

@deitch
Copy link
Contributor

deitch commented Oct 13, 2022

Haha, crossed wires! We found it at the same time. 😁

I don't see it as an issue. The very reason you include an algorithm is so that you are explicit about how to use the content.

@kzantow
Copy link
Contributor Author

kzantow commented Oct 13, 2022

Yeah, this looks like a pretty simple change. I'll add it to the backlog and we should be able to knock this one out pretty quickly. 👍

@kzantow kzantow added this to OSS Oct 13, 2022
@kzantow kzantow moved this to Backlog (Pulled Forward for Priority) in OSS Oct 13, 2022
@deitch
Copy link
Contributor

deitch commented Oct 13, 2022

Cool, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants