Skip to content

Conversation

@lsm5
Copy link
Member

@lsm5 lsm5 commented Dec 2, 2025

Rebased on #499 . Only review HEAD commit with image/storage/storage_dest.go changes here.

lsm5 added 2 commits December 1, 2025 08:42
Add a new `manifest.DigestWithAlgorithm` function that
allows computing the digest of a manifest using a specified algorithm
(e.g., SHA256, SHA512) while properly handling v2s1 signed manifest
signature stripping.

This addresses the need for skopeo's `--manifest-digest` flag to support
different digest algorithms while correctly handling all manifest types,
particularly Docker v2s1 signed manifests that require signature
stripping before digest computation.

Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
Allow the user to specify non-Canonical digest algorithm via
`supporteddigests.TmpDigestForNewObjects()` instead of
hardcoded `digest.Canonical` references.

Without --digest or with --digest=sha256, behavior remains unchanged
(SHA256 is the default).

Tested with a scratch built image.

Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>
@github-actions github-actions bot added the image Related to "image" package label Dec 2, 2025
podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Dec 2, 2025
@podmanbot
Copy link

✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6561

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

This serves well to highlight some of the design pain points (layer and image IDs)…

… but I’d also expect a lot of the complexity and thinking to center around the lookups in tryReusingBlobAsPending and while obtaining filename in createNewLayer, especially in mixed-algorithm scenarios. (BlobInfoCache might also be impacted and require a redesign.) I don’t know, maybe there is some simple logic that makes all of this simple to handle (“each storage.Layer only contains digest values all using the same algorithm????”), it’s not immediately obvious to me.

(“Request changes” at least for the .Encoded() calls in IDs.)

defer decompressed.Close()

diffID := digest.Canonical.Digester()
diffID := supporteddigests.TmpDigestForNewObjects().Digester()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more complex — the “Ensure that we always see the same “view” of a layer” code will compare the value to config’s RootFS.DiffIDs, and that might use a different digest.

Potentially, I guess we might need to compute two uncompressed digests: one to match against config, and one to identify the layer in c/storage at the user-wanted level of collision resistance.

// ordinaryImageID is a digest of a config, which is a JSON value.
// To avoid the risk of collisions, start the input with @ so that the input is not a valid JSON.
tocImageID := digest.FromString("@With TOC:" + tocIDInput).Encoded()
tocImageID := supporteddigests.TmpDigestForNewObjects().FromString("@With TOC:" + tocIDInput).Encoded()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strips the algorithm ID; that would not make the new implementation “agile” (we couldn’t switch to a different algorithm that produces digests of the same length).

“What should be the new format image IDs” is a significant design choice (also shows up in Podman UI).


If we are moving from sha256 (and presumably breaking users), that’s potentially a significant opportunity to entirely change the image ID design. There’s precedent in containerd using a manifest digest as an image ID … although that would not remove the need for tocIDInput being a factor in image deduplication.

Compare #508 / containers/skopeo#2750 : either image IDs will be digests, and then we need an ~explicit algorithm ID, or they will continue to use the ~existing format and then they must be ~random and not directly derived from sha256.


Above, m.ImageID typically (not always) just returns the config digest; that might not be using a strong enough algorithm. I suppose there’s an argument that a sha256-digested image is inherently sha256-collision prone, and that computing sha512-based digests does not remove the sha256 collisions but adds a risk of sha512 collisions on top. I don’t know what we want to promise here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, m.ImageID typically (not always) just returns the config digest;

Also, 2 different manifests with the same config digest, and layer digests using different algorithms?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are moving from sha256 (and presumably breaking users), that’s potentially a significant opportunity to entirely change the image ID design.

We need to ship something in v5.8 where (IIUC) we shouldn't break any existing sha256 workflows. Can we go with this for now, and then address this for podman v6 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the existing all-sha256 values need to stay unchanged (… in the situations where the values are predictable, i.e. = config digest; probably not in the TOC case here — that value has never been documented, and that code path is IIRC not reachable in the default configuration [with diffID checking] anyway).

What crypto experts want from us is not just “migrate to sha512”, but “give us crypto agility so that we don’t have to do a many-quarter project to move to another hash in the future”, and basing image IDs on .Encoded() for non-sha256 doesn’t achieve that.

return component
}
return digest.Canonical.FromString(parentID + "+" + component).Encoded()
return supporteddigests.TmpDigestForNewObjects().FromString(parentID + "+" + component).Encoded()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to image IDs, discarding the algorithm identifier would not be crypto-agile.

Layer IDs are ~easier in that they are not that prominent in the UI (although still visible), but, also, they are currently critical for layer sharing across images. Do we want to keep that design at all? Or should we store the “chain ID” values in a new field of a c/storage.Layer, and look up layers using that, maybe making layer IDs simply random 256-bit values?

// the per-platform one, which is saved below.
if !bytes.Equal(toplevelManifest, s.manifest) {
manifestDigest, err := manifest.Digest(toplevelManifest)
manifestDigest, err := manifest.DigestWithAlgorithm(toplevelManifest, supporteddigests.TmpDigestForNewObjects())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values end up in storage.Image.Digests and are used for lookups (see “Look for an image with the specified digest that has the same name,”); so, if the user pulls a repo@digest, we must allow lookup by exactly that digest.

I don’t immediately know what we do when pulling by tag. One option might be to pre-compute and allow lookup only by (probably, Canonical and) the preferred digest. Another might be to add support for lookups by an arbitrary digest algorithm to c/storage, where c/storage would actually digest all recorded manifests using the provided algorithm (and, I guess, cache the values once computed, although that might require RW locks where RO locks used to suffice).

defer decompressed.Close()

diffID := digest.Canonical.Digester()
diffID := supporteddigests.TmpDigestForNewObjects().Digester()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a more generic version of this concern, note that all over the place layer.UncompressedDigest and layer.TOCDigest only contains one value per layer; and options.Cache.Cache.UncompressedDigest{,ForTOC} only records one value per blob.

I don’t know what we want to do there… at least for the BlobInfoCache, we might determine that a compressed-sha256=A matches uncompressed-sha512=B (if pulling an old image configured with …ForNewObjects=sha512). Do we want to use that knowledge? Maybe we do, for future pulls (a sha256 collision means a collision at the registry), or at least for pulls from the same registry, but not for future pushes (if we have a sha512 image, we don’t want to regress to sha256 strength on uploads)??

// PutManifest writes the manifest to the destination.
func (s *storageImageDestination) PutManifest(ctx context.Context, manifestBlob []byte, instanceDigest *digest.Digest) error {
digest, err := manifest.Digest(manifestBlob)
digest, err := manifest.DigestWithAlgorithm(manifestBlob, supporteddigests.TmpDigestForNewObjects())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See elsewhere about lookups by manifest digest.


… but, also, see what PutSignaturesWithFormat does with the value. I’m not sure that actually makes a difference, the reader seems to ignore the entry for the top-level manifest digest, but that needs tracking down.

@packit-as-a-service
Copy link

Packit jobs failed. @containers/packit-build please check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

image Related to "image" package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants