-
Notifications
You must be signed in to change notification settings - Fork 50
[sha512] Configurable digest support for container image builds #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add a new `manifest.DigestWithAlgorithm` function that allows computing the digest of a manifest using a specified algorithm (e.g., SHA256, SHA512) while properly handling v2s1 signed manifest signature stripping. This addresses the need for skopeo's `--manifest-digest` flag to support different digest algorithms while correctly handling all manifest types, particularly Docker v2s1 signed manifests that require signature stripping before digest computation. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
Allow the user to specify non-Canonical digest algorithm via `supporteddigests.TmpDigestForNewObjects()` instead of hardcoded `digest.Canonical` references. Without --digest or with --digest=sha256, behavior remains unchanged (SHA256 is the default). Tested with a scratch built image. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
|
✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6561 |
mtrmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This serves well to highlight some of the design pain points (layer and image IDs)…
… but I’d also expect a lot of the complexity and thinking to center around the lookups in tryReusingBlobAsPending and while obtaining filename in createNewLayer, especially in mixed-algorithm scenarios. (BlobInfoCache might also be impacted and require a redesign.) I don’t know, maybe there is some simple logic that makes all of this simple to handle (“each storage.Layer only contains digest values all using the same algorithm????”), it’s not immediately obvious to me.
(“Request changes” at least for the .Encoded() calls in IDs.)
| defer decompressed.Close() | ||
|
|
||
| diffID := digest.Canonical.Digester() | ||
| diffID := supporteddigests.TmpDigestForNewObjects().Digester() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more complex — the “Ensure that we always see the same “view” of a layer” code will compare the value to config’s RootFS.DiffIDs, and that might use a different digest.
Potentially, I guess we might need to compute two uncompressed digests: one to match against config, and one to identify the layer in c/storage at the user-wanted level of collision resistance.
| // ordinaryImageID is a digest of a config, which is a JSON value. | ||
| // To avoid the risk of collisions, start the input with @ so that the input is not a valid JSON. | ||
| tocImageID := digest.FromString("@With TOC:" + tocIDInput).Encoded() | ||
| tocImageID := supporteddigests.TmpDigestForNewObjects().FromString("@With TOC:" + tocIDInput).Encoded() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strips the algorithm ID; that would not make the new implementation “agile” (we couldn’t switch to a different algorithm that produces digests of the same length).
“What should be the new format image IDs” is a significant design choice (also shows up in Podman UI).
If we are moving from sha256 (and presumably breaking users), that’s potentially a significant opportunity to entirely change the image ID design. There’s precedent in containerd using a manifest digest as an image ID … although that would not remove the need for tocIDInput being a factor in image deduplication.
Compare #508 / containers/skopeo#2750 : either image IDs will be digests, and then we need an ~explicit algorithm ID, or they will continue to use the ~existing format and then they must be ~random and not directly derived from sha256.
Above, m.ImageID typically (not always) just returns the config digest; that might not be using a strong enough algorithm. I suppose there’s an argument that a sha256-digested image is inherently sha256-collision prone, and that computing sha512-based digests does not remove the sha256 collisions but adds a risk of sha512 collisions on top. I don’t know what we want to promise here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above, m.ImageID typically (not always) just returns the config digest;
Also, 2 different manifests with the same config digest, and layer digests using different algorithms?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are moving from
sha256(and presumably breaking users), that’s potentially a significant opportunity to entirely change the image ID design.
We need to ship something in v5.8 where (IIUC) we shouldn't break any existing sha256 workflows. Can we go with this for now, and then address this for podman v6 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the existing all-sha256 values need to stay unchanged (… in the situations where the values are predictable, i.e. = config digest; probably not in the TOC case here — that value has never been documented, and that code path is IIRC not reachable in the default configuration [with diffID checking] anyway).
What crypto experts want from us is not just “migrate to sha512”, but “give us crypto agility so that we don’t have to do a many-quarter project to move to another hash in the future”, and basing image IDs on .Encoded() for non-sha256 doesn’t achieve that.
| return component | ||
| } | ||
| return digest.Canonical.FromString(parentID + "+" + component).Encoded() | ||
| return supporteddigests.TmpDigestForNewObjects().FromString(parentID + "+" + component).Encoded() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to image IDs, discarding the algorithm identifier would not be crypto-agile.
Layer IDs are ~easier in that they are not that prominent in the UI (although still visible), but, also, they are currently critical for layer sharing across images. Do we want to keep that design at all? Or should we store the “chain ID” values in a new field of a c/storage.Layer, and look up layers using that, maybe making layer IDs simply random 256-bit values?
| // the per-platform one, which is saved below. | ||
| if !bytes.Equal(toplevelManifest, s.manifest) { | ||
| manifestDigest, err := manifest.Digest(toplevelManifest) | ||
| manifestDigest, err := manifest.DigestWithAlgorithm(toplevelManifest, supporteddigests.TmpDigestForNewObjects()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values end up in storage.Image.Digests and are used for lookups (see “Look for an image with the specified digest that has the same name,”); so, if the user pulls a repo@digest, we must allow lookup by exactly that digest.
I don’t immediately know what we do when pulling by tag. One option might be to pre-compute and allow lookup only by (probably, Canonical and) the preferred digest. Another might be to add support for lookups by an arbitrary digest algorithm to c/storage, where c/storage would actually digest all recorded manifests using the provided algorithm (and, I guess, cache the values once computed, although that might require RW locks where RO locks used to suffice).
| defer decompressed.Close() | ||
|
|
||
| diffID := digest.Canonical.Digester() | ||
| diffID := supporteddigests.TmpDigestForNewObjects().Digester() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a more generic version of this concern, note that all over the place layer.UncompressedDigest and layer.TOCDigest only contains one value per layer; and options.Cache.Cache.UncompressedDigest{,ForTOC} only records one value per blob.
I don’t know what we want to do there… at least for the BlobInfoCache, we might determine that a compressed-sha256=A matches uncompressed-sha512=B (if pulling an old image configured with …ForNewObjects=sha512). Do we want to use that knowledge? Maybe we do, for future pulls (a sha256 collision means a collision at the registry), or at least for pulls from the same registry, but not for future pushes (if we have a sha512 image, we don’t want to regress to sha256 strength on uploads)??
| // PutManifest writes the manifest to the destination. | ||
| func (s *storageImageDestination) PutManifest(ctx context.Context, manifestBlob []byte, instanceDigest *digest.Digest) error { | ||
| digest, err := manifest.Digest(manifestBlob) | ||
| digest, err := manifest.DigestWithAlgorithm(manifestBlob, supporteddigests.TmpDigestForNewObjects()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See elsewhere about lookups by manifest digest.
… but, also, see what PutSignaturesWithFormat does with the value. I’m not sure that actually makes a difference, the reader seems to ignore the entry for the top-level manifest digest, but that needs tracking down.
|
Packit jobs failed. @containers/packit-build please check. |
Rebased on #499 . Only review HEAD commit with
image/storage/storage_dest.gochanges here.