Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text to allow reproducible builds #214

Merged
merged 2 commits into from
Apr 26, 2023

Conversation

david-a-wheeler
Copy link
Contributor

No description provided.

Signed-off-by: David A. Wheeler <dwheeler@dwheeler.com>
Copy link
Member

@swinslow swinslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any input on the substance, just flagging a couple of nit typos. Thanks!

model/Core/Classes/CreationInfo.md Outdated Show resolved Hide resolved
model/Core/Properties/created.md Outdated Show resolved Hide resolved
Signed-off-by: David A. Wheeler <dwheeler@dwheeler.com>
@seabass-labrax
Copy link

seabass-labrax commented Apr 17, 2023

This is something that I have given much thought to in the past, and my conclusion was that 'reproducible' SBOMs is an inappropriate goal for SPDX.

For the first case of 'build time', it is important to understand that a Build Element refers to a build event. It describes not that Package C could have been built from Package A and B, but rather that it was actually done so.

Omitting both the start and end times would preclude the use of SPDX data in what is probably the fields' most useful application. Imagine that I have a CI system, which is running on a Linux system that can be accessed via SSH. I discover today that my SSH private key was stolen on the 21st of December 2022, so I regenerate the SSH key immediately and do a clean reinstall the CI software. The existence of the build times in my SPDX data allows me to identify which specific packages were built while the CI system may have been compromised - in pseudo-SQL, SELECT id, name FROM elements WHERE type = 'Build' AND buildStart > '2022-12-21' AND buildEnd < '2023-04-17'. I can then revoke the signatures on those packages, inform users of the risk of 'Trojan' malware and, if those source packages themselves supported reproducible builds, test each one to see if any were tampered with during compilation.

From my extensive involvement with the subcommittee focused on the 'build profile' in SPDX, this is the most prominent use-case. Whilst it might not be a problem to make build times optional in SPDX, I believe that promoting this in order to make the SPDX document itself reproducible would be a mistake.

Regarding the second case of 'created', I could not think of any reason why it would be necessary to make SBOMs themselves reproducible. Each SPDX 3.0 Element with have a unique, immutable representation called the 'Canonical' form, allowing one to check if two serialised Elements are the same. If one wants to ensure that the metadata that SPDX communicates is stable and reproducible from an analysis standpoint, one can simply compare (using the same method) all the properties inside the Elements except the IDs and creation information.

I hope this post elucidates the topic and provides clarity into why SPDX Elements are immutable, individual and considered to have been created at an instant in time.

@david-a-wheeler
Copy link
Contributor Author

Please note that I did NOT touch the buildevent entries, only on creation events.

You can use buildevents to record the actual buildevent dateTimes. An SPDX file with such information won't be reproducible, and thus would be inappropriate for many use cases.

I think this threads the needle - you can record information that is per-instance, but you don't necessarily drop reproducible builds.

@goneall
Copy link
Member

goneall commented Apr 17, 2023

My understanding is that the CreationInfo created property refers to the date the SPDX metadata for the Element was created not when the element itself was created.

@david-a-wheeler
Copy link
Contributor Author

david-a-wheeler commented Apr 18, 2023

My understanding is that the CreationInfo created property refers to the date the SPDX metadata for the Element was created not when the element itself was created.

Yes. I view that as a bug & I'm trying to fix it :-).

The problem is that many people are creating SDPX data during the build process. When the build process is re-rerun (e.g., to verify reproducible builds), the SDPX data would be regenerated. If the SPDX data is included in comparisons (and that's often the easy way to do things), the builds aren't reproducible. The simple solution is to not generate SPDX, but that seems like a terrible solution. Ideally you'd compare "only the generated executable package" but some people want to embed SPDX into the package, making that kind of separation undesirable.

In reproducible builds, typically the "creation" time is forced to be equal to the "time of last commit that changed the source". This simply allows SPDX files to use this common convention. For more information see https://reproducible-builds.org/docs/timestamps/.

@goneall
Copy link
Member

goneall commented Apr 18, 2023

but some people want to embed SPDX into the package, making that kind of separation undesirable.

Good point. There are actually a few issues with embedding SPDX in the package itself. One problem is you'd like to have a checksum of the package in the SPDX document itself for validation. A bit difficult if the SPDX document is actually inside the package.

One solution is to generate the SPDX document on every build but not consider it to be "inside" the package, but rather something produced alongside the package. Much in the same way a SHA256 is generated for a package download file and included in a directory alongside the package itself.

@david-a-wheeler
Copy link
Contributor Author

Another solution, by the way, would be to make the datetime optional. Either allowing the datetime to be optional or allowing datetime to be the datetime of last source change would resolve the problem.

@rpurdie
Copy link

rpurdie commented Apr 18, 2023

but some people want to embed SPDX into the package, making that kind of separation undesirable.

Good point. There are actually a few issues with embedding SPDX in the package itself. One problem is you'd like to have a checksum of the package in the SPDX document itself for validation. A bit difficult if the SPDX document is actually inside the package.

One solution is to generate the SPDX document on every build but not consider it to be "inside" the package, but rather something produced alongside the package. Much in the same way a SHA256 is generated for a package download file and included in a directory alongside the package itself.

Yocto Project builds view things as inputs (software or a recipe in our terminology) which have multiple outputs. One output could be packages, another could be the SPDX document. There could also be firmware binaries, images, test reports and so on. In our situation, we closely control all the inputs complete with checksums. We then require the same outputs to be generated for a given set of inputs. In general we achieve this down to the file timestamps. SPDX manifests are sadly not reproducible today due to the creation timestamps. Like David, I believe this is a bug. If we can omit them, or set them to our "last source change" date, we can make them reproducible and I think that it is a worthy goal. It might not be appropriate everywhere but I believe it is for use cases like ours. We could just "bend" the spec but I'd prefer to be up front and open about this!

@goneall
Copy link
Member

goneall commented Apr 19, 2023

Thanks @david-a-wheeler and @rpurdie for the details on the use case. I'm starting to agree with you, but I think we need a bit more context in the description. I'll suggest some wording.

model/Core/Classes/CreationInfo.md Show resolved Hide resolved
model/Core/Properties/created.md Show resolved Hide resolved
@goneall goneall mentioned this pull request Apr 23, 2023
7 tasks
Copy link
Contributor

@kestewart kestewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised wording looks reasonable, and addresses David & Richards concerns. @goneall - I think they've addressed your concerns, but if you dissagree, please open an issue.

@kestewart kestewart merged commit 0c3b7c7 into spdx:main Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants