[sha512] Configurable digest support for container image builds #519

lsm5 · 2025-12-02T19:38:25Z

Rebased on #499 . Only review HEAD commit with image/storage/storage_dest.go changes here.

Add a new `manifest.DigestWithAlgorithm` function that allows computing the digest of a manifest using a specified algorithm (e.g., SHA256, SHA512) while properly handling v2s1 signed manifest signature stripping. This addresses the need for skopeo's `--manifest-digest` flag to support different digest algorithms while correctly handling all manifest types, particularly Docker v2s1 signed manifests that require signature stripping before digest computation. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>

Allow the user to specify non-Canonical digest algorithm via `supporteddigests.TmpDigestForNewObjects()` instead of hardcoded `digest.Canonical` references. Without --digest or with --digest=sha256, behavior remains unchanged (SHA256 is the default). Tested with a scratch built image. Signed-off-by: Lokesh Mandvekar <lsm5@redhat.com>

podmanbot · 2025-12-02T19:39:45Z

✅ A new PR has been created in buildah to vendor these changes: containers/buildah#6561

mtrmac

Thanks!

This serves well to highlight some of the design pain points (layer and image IDs)…

… but I’d also expect a lot of the complexity and thinking to center around the lookups in tryReusingBlobAsPending and while obtaining filename in createNewLayer, especially in mixed-algorithm scenarios. (BlobInfoCache might also be impacted and require a redesign.) I don’t know, maybe there is some simple logic that makes all of this simple to handle (“each storage.Layer only contains digest values all using the same algorithm????”), it’s not immediately obvious to me.

(“Request changes” at least for the .Encoded() calls in IDs.)

mtrmac · 2025-12-02T23:19:25Z

image/storage/storage_dest.go

 		defer decompressed.Close()

-		diffID := digest.Canonical.Digester()
+		diffID := supporteddigests.TmpDigestForNewObjects().Digester()


This is more complex — the “Ensure that we always see the same “view” of a layer” code will compare the value to config’s RootFS.DiffIDs, and that might use a different digest.

Potentially, I guess we might need to compute two uncompressed digests: one to match against config, and one to identify the layer in c/storage at the user-wanted level of collision resistance.

mtrmac · 2025-12-02T23:21:24Z

image/storage/storage_dest.go

 	// ordinaryImageID is a digest of a config, which is a JSON value.
 	// To avoid the risk of collisions, start the input with @ so that the input is not a valid JSON.
-	tocImageID := digest.FromString("@With TOC:" + tocIDInput).Encoded()
+	tocImageID := supporteddigests.TmpDigestForNewObjects().FromString("@With TOC:" + tocIDInput).Encoded()


This strips the algorithm ID; that would not make the new implementation “agile” (we couldn’t switch to a different algorithm that produces digests of the same length).

“What should be the new format image IDs” is a significant design choice (also shows up in Podman UI).

If we are moving from sha256 (and presumably breaking users), that’s potentially a significant opportunity to entirely change the image ID design. There’s precedent in containerd using a manifest digest as an image ID … although that would not remove the need for tocIDInput being a factor in image deduplication.

Compare #508 / containers/skopeo#2750 : either image IDs will be digests, and then we need an ~explicit algorithm ID, or they will continue to use the ~existing format and then they must be ~random and not directly derived from sha256.

Above, m.ImageID typically (not always) just returns the config digest; that might not be using a strong enough algorithm. I suppose there’s an argument that a sha256-digested image is inherently sha256-collision prone, and that computing sha512-based digests does not remove the sha256 collisions but adds a risk of sha512 collisions on top. I don’t know what we want to promise here.

Above, m.ImageID typically (not always) just returns the config digest;

Also, 2 different manifests with the same config digest, and layer digests using different algorithms?!

If we are moving from sha256 (and presumably breaking users), that’s potentially a significant opportunity to entirely change the image ID design.

We need to ship something in v5.8 where (IIUC) we shouldn't break any existing sha256 workflows. Can we go with this for now, and then address this for podman v6 ?

I agree the existing all-sha256 values need to stay unchanged (… in the situations where the values are predictable, i.e. = config digest; probably not in the TOC case here — that value has never been documented, and that code path is IIRC not reachable in the default configuration [with diffID checking] anyway).

What crypto experts want from us is not just “migrate to sha512”, but “give us crypto agility so that we don’t have to do a many-quarter project to move to another hash in the future”, and basing image IDs on .Encoded() for non-sha256 doesn’t achieve that.

mtrmac · 2025-12-02T23:24:28Z

image/storage/storage_dest.go

 		return component
 	}
-	return digest.Canonical.FromString(parentID + "+" + component).Encoded()
+	return supporteddigests.TmpDigestForNewObjects().FromString(parentID + "+" + component).Encoded()


Similarly to image IDs, discarding the algorithm identifier would not be crypto-agile.

Layer IDs are ~easier in that they are not that prominent in the UI (although still visible), but, also, they are currently critical for layer sharing across images. Do we want to keep that design at all? Or should we store the “chain ID” values in a new field of a c/storage.Layer, and look up layers using that, maybe making layer IDs simply random 256-bit values?

mtrmac · 2025-12-02T23:34:24Z

image/storage/storage_dest.go

 	// the per-platform one, which is saved below.
 	if !bytes.Equal(toplevelManifest, s.manifest) {
-		manifestDigest, err := manifest.Digest(toplevelManifest)
+		manifestDigest, err := manifest.DigestWithAlgorithm(toplevelManifest, supporteddigests.TmpDigestForNewObjects())


These values end up in storage.Image.Digests and are used for lookups (see “Look for an image with the specified digest that has the same name,”); so, if the user pulls a repo@digest, we must allow lookup by exactly that digest.

I don’t immediately know what we do when pulling by tag. One option might be to pre-compute and allow lookup only by (probably, Canonical and) the preferred digest. Another might be to add support for lookups by an arbitrary digest algorithm to c/storage, where c/storage would actually digest all recorded manifests using the provided algorithm (and, I guess, cache the values once computed, although that might require RW locks where RO locks used to suffice).

mtrmac · 2025-12-02T23:46:32Z

image/storage/storage_dest.go

 		defer decompressed.Close()

-		diffID := digest.Canonical.Digester()
+		diffID := supporteddigests.TmpDigestForNewObjects().Digester()


As a more generic version of this concern, note that all over the place layer.UncompressedDigest and layer.TOCDigest only contains one value per layer; and options.Cache.Cache.UncompressedDigest{,ForTOC} only records one value per blob.

I don’t know what we want to do there… at least for the BlobInfoCache, we might determine that a compressed-sha256=A matches uncompressed-sha512=B (if pulling an old image configured with …ForNewObjects=sha512). Do we want to use that knowledge? Maybe we do, for future pulls (a sha256 collision means a collision at the registry), or at least for pulls from the same registry, but not for future pushes (if we have a sha512 image, we don’t want to regress to sha256 strength on uploads)??

mtrmac · 2025-12-02T23:51:01Z

image/storage/storage_dest.go

 // PutManifest writes the manifest to the destination.
 func (s *storageImageDestination) PutManifest(ctx context.Context, manifestBlob []byte, instanceDigest *digest.Digest) error {
-	digest, err := manifest.Digest(manifestBlob)
+	digest, err := manifest.DigestWithAlgorithm(manifestBlob, supporteddigests.TmpDigestForNewObjects())


See elsewhere about lookups by manifest digest.

… but, also, see what PutSignaturesWithFormat does with the value. I’m not sure that actually makes a difference, the reader seems to ignore the entry for the top-level manifest digest, but that needs tracking down.

packit-as-a-service · 2025-12-03T09:11:08Z

Packit jobs failed. @containers/packit-build please check.

lsm5 added 2 commits December 1, 2025 08:42

github-actions bot added the image Related to "image" package label Dec 2, 2025

podmanbot pushed a commit to podmanbot/buildah that referenced this pull request Dec 2, 2025

dnm: Vendor changes from containers/container-libs#519

9b86702

podmanbot mentioned this pull request Dec 2, 2025

Sync: [sha512] Configurable digest support for container image builds containers/buildah#6561

Draft

lsm5 mentioned this pull request Dec 2, 2025

Tracker of removing/reviewing hard-coded uses of SHA-256 #507

Open

39 tasks

mtrmac requested changes Dec 3, 2025

View reviewed changes

[sha512] Configurable digest support for container image builds #519

Are you sure you want to change the base?

[sha512] Configurable digest support for container image builds #519

Uh oh!

Conversation

lsm5 commented Dec 2, 2025

Uh oh!

podmanbot commented Dec 2, 2025

Uh oh!

mtrmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

packit-as-a-service bot commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants