-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add text to allow reproducible builds #214
Conversation
Signed-off-by: David A. Wheeler <dwheeler@dwheeler.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any input on the substance, just flagging a couple of nit typos. Thanks!
This is something that I have given much thought to in the past, and my conclusion was that 'reproducible' SBOMs is an inappropriate goal for SPDX. For the first case of 'build time', it is important to understand that a Omitting both the start and end times would preclude the use of SPDX data in what is probably the fields' most useful application. Imagine that I have a CI system, which is running on a Linux system that can be accessed via SSH. I discover today that my SSH private key was stolen on the 21st of December 2022, so I regenerate the SSH key immediately and do a clean reinstall the CI software. The existence of the build times in my SPDX data allows me to identify which specific packages were built while the CI system may have been compromised - in pseudo-SQL, From my extensive involvement with the subcommittee focused on the 'build profile' in SPDX, this is the most prominent use-case. Whilst it might not be a problem to make build times optional in SPDX, I believe that promoting this in order to make the SPDX document itself reproducible would be a mistake. Regarding the second case of 'created', I could not think of any reason why it would be necessary to make SBOMs themselves reproducible. Each SPDX 3.0 Element with have a unique, immutable representation called the 'Canonical' form, allowing one to check if two serialised Elements are the same. If one wants to ensure that the metadata that SPDX communicates is stable and reproducible from an analysis standpoint, one can simply compare (using the same method) all the properties inside the Elements except the IDs and creation information. I hope this post elucidates the topic and provides clarity into why SPDX Elements are immutable, individual and considered to have been created at an instant in time. |
Please note that I did NOT touch the buildevent entries, only on creation events. You can use buildevents to record the actual buildevent dateTimes. An SPDX file with such information won't be reproducible, and thus would be inappropriate for many use cases. I think this threads the needle - you can record information that is per-instance, but you don't necessarily drop reproducible builds. |
My understanding is that the CreationInfo created property refers to the date the SPDX metadata for the Element was created not when the element itself was created. |
Yes. I view that as a bug & I'm trying to fix it :-). The problem is that many people are creating SDPX data during the build process. When the build process is re-rerun (e.g., to verify reproducible builds), the SDPX data would be regenerated. If the SPDX data is included in comparisons (and that's often the easy way to do things), the builds aren't reproducible. The simple solution is to not generate SPDX, but that seems like a terrible solution. Ideally you'd compare "only the generated executable package" but some people want to embed SPDX into the package, making that kind of separation undesirable. In reproducible builds, typically the "creation" time is forced to be equal to the "time of last commit that changed the source". This simply allows SPDX files to use this common convention. For more information see https://reproducible-builds.org/docs/timestamps/. |
Good point. There are actually a few issues with embedding SPDX in the package itself. One problem is you'd like to have a checksum of the package in the SPDX document itself for validation. A bit difficult if the SPDX document is actually inside the package. One solution is to generate the SPDX document on every build but not consider it to be "inside" the package, but rather something produced alongside the package. Much in the same way a SHA256 is generated for a package download file and included in a directory alongside the package itself. |
Another solution, by the way, would be to make the datetime optional. Either allowing the datetime to be optional or allowing datetime to be the datetime of last source change would resolve the problem. |
Yocto Project builds view things as inputs (software or a recipe in our terminology) which have multiple outputs. One output could be packages, another could be the SPDX document. There could also be firmware binaries, images, test reports and so on. In our situation, we closely control all the inputs complete with checksums. We then require the same outputs to be generated for a given set of inputs. In general we achieve this down to the file timestamps. SPDX manifests are sadly not reproducible today due to the creation timestamps. Like David, I believe this is a bug. If we can omit them, or set them to our "last source change" date, we can make them reproducible and I think that it is a worthy goal. It might not be appropriate everywhere but I believe it is for use cases like ours. We could just "bend" the spec but I'd prefer to be up front and open about this! |
Thanks @david-a-wheeler and @rpurdie for the details on the use case. I'm starting to agree with you, but I think we need a bit more context in the description. I'll suggest some wording. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised wording looks reasonable, and addresses David & Richards concerns. @goneall - I think they've addressed your concerns, but if you dissagree, please open an issue.
No description provided.