Replies: 12 comments 22 replies
-
I think that in context of reproducibility and secure supply chain SW delivery lockfiles as a concept makes sense and from my layman's perspective it looks plausible however I don't feel competent enough to review this format in depth. What I as a stakeholder am interested in knowing is the following:
|
Beta Was this translation helpful? Give feedback.
-
Looking at the lockfile format I can see a repeating pattern of:
IOW any tool which wants to process this has to analyze every single package to categorize it under some repository (optionally denoted by repoid), so if repoid isn't provided such a tool is free to choose how to group uncategorized packages (assigning them to 1 or multiple repos). Wouldn't it be better if we simplified such a tool's job by explicitly hinting at which repo what packages belong? I mean in my mind that would match the current
|
Beta Was this translation helpful? Give feedback.
-
@Tojaj @lubomir @onosek I just read this:rpm-software-management/dnf5#833 which actually pitched the idea of standardizing on the format across the whole RPM ecosystem rather than letting other communities implementing similar idea from scratch over and over again. Shouldn't this proposal be actually better suited in that discussion and bring it back to life again? |
Beta Was this translation helpful? Give feedback.
-
Something that occurred to me just now - why does the lockfile contain multiple architectures? Isn't a purpose of a more-or-less generic lockfile to address a single use case? IOW the way I'm now looking at the format it looks like whatever is consuming the lockfile is supposed to prefetch data for multiple parallel container build runs based on architecture, so it feels like a batch operation. I'm ambivalent on such design, so I wonder from a given pipeline's black box perspective if the nuance of architectures as a list isn't an implementation detail of a given pipeline rather than a generic thing that may be useful outside of the intended private use case. |
Beta Was this translation helpful? Give feedback.
-
Reading all the fuss with repoid, are you sure you are targeting the right project (rpm)? I feel you rather want to engage with package mangers like DNF, zypper, dragora etc. A relevant request for DNF5 is rpm-software-management/dnf5#833. |
Beta Was this translation helpful? Give feedback.
-
Yeah this doesn't seem particularly relevant to rpm itself. But if people want to use this as a depsolver agnostic place to discuss it, you're welcome to do so. |
Beta Was this translation helpful? Give feedback.
-
I agree with @ppisar and @pmatilai above, this proposal seems to be sitting one "floor" above us 😄 We don't deal with repositories or the distribution of packages in general. That said, of course, if anything comes out of this discussion that impacts RPM itself, we're happy to help. Also, like Panu said above, feel free to keep the discussion here now that it's ongoing. |
Beta Was this translation helpful? Give feedback.
-
I just want to point out that structure might be vie reversa. Instead
To use
Then I discovered one problem when one file will be used for all architecture. Repositories has the same problem as RPMs. Not all repositories are available for all architecture. Somehow I do not recommend to combine repositories and url for packages. Please pick one option and not a combination them for one particular package. Repositories uses metalinks to redistribute network workflow and to improve stability and performance. |
Beta Was this translation helpful? Give feedback.
-
I wonder if you need to store the repo file. Just having a name is not of much use. Especially for yum/dnf where the name is in the local file and can be changed at will. You will also need to store $release (you obviously already have $arch) to be able to interpret the links in there. |
Beta Was this translation helpful? Give feedback.
-
RPM checks dependencies, doesn't it? Then I don't see why RPM should not allow only the dependencies as specified in some lock file, maybe ignoring the full URL just focusing on NVR |
Beta Was this translation helpful? Give feedback.
-
Just to keep this in sync what was discussed via a private channel, package checksums (in some form) will need to be introduced to the format. |
Beta Was this translation helpful? Give feedback.
-
A thought on a possible (future?) extension of the format. Based on a remark from RPM team during a meeting some time ago, where they mentioned that lock file could serve as a manifest. I was thinking about that and in deed, "manifest" may be a nice potential use-case for the format. I think that if we add two more pieces of information for each RPM, then this format would be able to serve as a manifest quite well. The two pieces of information would be:
Example of the extended format (btw. the order of attributes doesn't matter as these are dictionaries):
These two new attributes would be necessary if someone wants to use the format as a manifest - because there is no guarantee that RPM filename in the URL would contain all the important information. Btw. To illustrate the need for these, see the manifests used by coreos: https://github.com/coreos/fedora-coreos-config/blob/testing-devel/manifest-lock.x86_64.json |
Beta Was this translation helpful? Give feedback.
-
Introduction of rpms.lock.yaml file
Context
My team is currently working on the implementation of a hermetic build process for containers that use RPMs. The build process runs in a network-isolated build environment. To be able to implement this, we need to pre-fetch all required RPMs and a full chain of their transitive dependencies to be available during the build process (except for packages that are already installed in the parent container image). As part of this requirement, we also want to strive towards reproducibility. To prefetch all required RPMs, including dependencies, and to be able to pre-fetch the same set of them when we re-run the build with the same input parameters, we need a "lock" file similar to one known from Python - requirements.txt that is programmatically generated from an input file called requirements.in.
To be transparent and to give you a chance to provide feedback as RPM ecosystem SMEs I would like to present to you the format of the lock file we designed.
For more details about our requirements for the container build process, you can see SLSA requirements, especially these:
rpms.lock.yaml
A file that contains a list of fully resolved dependencies (their URLs) that cachi2 (https://github.com/containerbuildsystem/cachi2/) will need to download for a hermetic build. This file contains a different list of RPMs per architecture. Only the RPMs listed in this file will be available during the build process as the build process has no access to the internet.
Note: This file contains only RPMs that will be installed on top of the parent image - i.e. RPMs that are required but are already installed in the parent container image are not included in this file.
This will be generated and maintained programmatically based on an input file (rpms.in.yaml) that is out of the scope of this doc.
📔 Notes about format design
dnf info $PKG
). This is beneficial for example for a container vulnerability scanning tool Clair.⚙️ Example
We understand that managing such a lock file manually is going to be very cumbersome and difficult. The long-term plan is to have a tool that will be able to automate it. This however is not within the scope of our minimal viable product.
Possible future extensions
These are some possible extensions that we can envision may become relevant at some point in future and can be easily added because of the YAML format, but they are not planned right now as our use case doesn't need them.
Beta Was this translation helpful? Give feedback.
All reactions