Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sha512 content #543

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sudo-bmitch
Copy link
Contributor

@sudo-bmitch sudo-bmitch commented Jun 20, 2024

Fixes #494.

Pending tasks before removing this from draft:

  • Redesign conformance tests in a separate PR to make a matrix of types of content and registry APIs to test
  • Add conformance tests
  • Get buy-in from implementations of clients and registries

spec.md Show resolved Hide resolved
@sudo-bmitch sudo-bmitch force-pushed the pr-digest-on-tag-push branch from b347347 to 811619c Compare July 13, 2024 20:55
@sudo-bmitch sudo-bmitch changed the title Allow client to specify a digest when pushing by tag Support sha512 content Jul 13, 2024
@sudo-bmitch
Copy link
Contributor Author

CI is failing on this because Zot will need to be updated for the changed APIs. I've been testing against olareg so I know this can work. This gives me more reasons to want to rewrite the conformance tests to focus on a matrix of content and APIs to test. The current tests are getting difficult to follow and manage.

@sudo-bmitch
Copy link
Contributor Author

@rchincha it would be good to get your feedback on this, and whether Zot would be able to support it.

@rchincha
Copy link
Contributor

project-zot/zot#2075
^ on zot side

@@ -329,6 +331,12 @@ The process remains unchanged for chunked upload, except that the post request M
Content-Length: 0
```

When pushing a blob with a digest algorithm other than `sha256`, the post request SHOULD include the `digest-algorithm` parameter:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this new query param? We already send the expected digest (including the algorithm) as a mandatory parameter when closing the blob upload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons I can think of. It's a courtesy to registries that compute the digest as the blob is received and don't want to reread the blob to recompute it with a new algorithm. It's also a way for clients to get a fast failure if they request an algorithm not supported by the registry, before spending the time and bandwidth to upload the blob.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason this is a query param instead of a header? so API endpoint defs don't change (?)
What are our guidelines regarding what gets set as a header and what as a query param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use query params in a lot of other parts of the spec. Is there a reason this should be a header?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the POST, PATCH, PUT sequence, we could have alternatively modeled this as 'Content-Type'/'Accept' type header exchange.
'OCI-Digest-Type'/'OCI-Accept-Digest-Type'? in the POST step itself.

^ just a thought!

In general, older registries will reject in the PUT step, and we cannot assume that registries are implementing streamed hashing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, older registries will reject in the PUT step, and we cannot assume that registries are implementing streamed hashing.

Correct, the failure condition on older registries requires the blob to be pushed first. This gives newer registries the ability to improve performance of the blob push.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here... the client may include sha256 to ensure it is supported, this for future use cases presuming there will eventually be replacement(s)

@andaaron
Copy link

andaaron commented Jul 19, 2024

Hi @sudo-bmitch

project-zot/zot#2075 ^ on zot side

This one was an intermediate step to replace hardcoded sha256 logic.

This one project-zot/zot#2554 passes for me locally with the conformance tests in this PR. It implements the digest query parameter for pushing manifests.

@rchincha
Copy link
Contributor

@sudo-bmitch at least in "draft" state, maybe update zot to point to @andaaron 's tree/commit.
andaaron/zot@6812b5c


`/v2/<name>/blobs/uploads/?digest-algorithm=<algorithm>` <sup>[end-4c](#endpoints)</sup>

Here, `<digest-algorithm>` is the algorithm the registry should use for the blob, e.g. `digest-algorithm=sha512`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor nit, but shouldn't the example here still be sha256 since we don't want to accidentally encourage anyone to actually use sha512?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter isn't needed for sha256. That's specified on line 334. I think sha512 is the only other valid digest algorithm today that can be specified here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with the idea of at least showing both in the examples... explaining it is normally default.. and testing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating 256 at some point would be hard if the default for this new parameter is not specified. I general leaning towards explicit even for sha256.

Copy link
Contributor Author

@sudo-bmitch sudo-bmitch Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a registry sees a request without a digest-algorithm, are we saying the registry can pick any blob algorithm it wants to, and if the client specifies a different one on the PUT then the request could fail after the blow was uploaded? I'd feel a lot safer if a missing algorithm is treated the same as a legacy client that is using sha256. In which case there won't be a difference between not specifying the parameter and specifying the parameter with sha256.

I'll still add it if that's the community desire, but I worry there may be registries that don't recognize the parameter and reject the request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec doesn't seem to officially say that's the expected behavior, so it really doesn't seem unreasonable for it to do something different than sha256 if it has a reason to.

@sudo-bmitch
Copy link
Contributor Author

sudo-bmitch commented Jul 20, 2024

@sudo-bmitch at least in "draft" state, maybe update zot to point to @andaaron 's tree/commit. andaaron/zot@6812b5c

@rchincha I need the changes for this:

distribution-spec/Makefile

Lines 95 to 105 in 11b8e3f

TEST_REGISTRY_CONTAINER ?= ghcr.io/project-zot/zot-minimal-linux-amd64:v2.0.0-rc6@sha256:bf95a94849cd9c6f596fb10e5a2d03b74267e7886d1ba0b3dab33337d9e46e5c
registry-ci:
docker rm -f oci-conformance && \
mkdir -p $(OUTPUT_DIRNAME) && \
echo '{"distSpecVersion":"1.1.0-dev","storage":{"rootDirectory":"/tmp/zot","gc":false,"dedupe":false},"http":{"address":"0.0.0.0","port":"5000"},"log":{"level":"debug"}}' > $(shell pwd)/$(OUTPUT_DIRNAME)/zot-config.json
docker run -d \
-v $(shell pwd)/$(OUTPUT_DIRNAME)/zot-config.json:/etc/zot/config.json \
--name=oci-conformance \
-p 5000:5000 \
$(TEST_REGISTRY_CONTAINER) && \
sleep 5

Do you have a container image to point to?

@rchincha
Copy link
Contributor

@sudo-bmitch Thinking about this some more ...

Maybe split this work into two parts - 1) first the ability to handle a non-sha256 hash so no further API changes, and 2) the ability to optimize via streaming hints. It may turn out that 2) may not be needed at all. Thoughts?

@sudo-bmitch
Copy link
Contributor Author

@sudo-bmitch Thinking about this some more ...

Maybe split this work into two parts - 1) first the ability to handle a non-sha256 hash so no further API changes, and 2) the ability to optimize via streaming hints. It may turn out that 2) may not be needed at all. Thoughts?

@rchincha I'm not sure I follow. What would you include in the first part, the part that has no API changes?

@rchincha
Copy link
Contributor

@sudo-bmitch Thinking about this some more ...
Maybe split this work into two parts - 1) first the ability to handle a non-sha256 hash so no further API changes, and 2) the ability to optimize via streaming hints. It may turn out that 2) may not be needed at all. Thoughts?

@rchincha I'm not sure I follow. What would you include in the first part, the part that has no API changes?

I would focus only on making PUT /v2/...blob/?digest=sha512:... work first.
Thoughts?

@sudo-bmitch
Copy link
Contributor Author

I would focus only on making PUT /v2/...blob/?digest=sha512:... work first.
Thoughts?

@rchincha since that's already in the spec, what would OCI need to do in that scenario? Does that involve closing this PR and doing nothing? Would we leave #494 unresolved?

@rchincha
Copy link
Contributor

rchincha commented Jul 23, 2024

I would focus only on making PUT /v2/...blob/?digest=sha512:... work first.
Thoughts?

@rchincha since that's already in the spec, what would OCI need to do in that scenario? Does that involve closing this PR and doing nothing?

I think we still need to resolve the question with guidelines of mixing and matching digests. Here is a sequence (really #494 (comment) rephrased)

  1. Customer one builds an image (say A) with sha256 and signs and pushes to a registry
  2. Customer two builds another image (say B) derived from A but with sha512 and push to a registry
    2.1 If the manifest is pushed as sha512, do we then convert everything to sha512?
    2.2 Or do we not? coz signatures and sucks for registries, and I would like to keep the ability to trace back image lineage etc.
    2.3 If registries federate and pull from other registries, do they do 2.1 or 2.2 above?
    2.4 Are clients supposed to do some of the above instead?
  3. Nice that we now support multiple digests, but what if one of them is broken, what is the path forward. A bad hash algo will bite us maybe 10 years from now.
    etc, etc ...

We can lean on some lessons from other content-addressable projects:
https://lwn.net/Articles/898522/

My concern (IMHO) is we have not set a high-level direction or rationale yet for above first.

Would we leave #494 unresolved?

No, will resolve shortly after.

@sudo-bmitch
Copy link
Contributor Author

sudo-bmitch commented Jul 23, 2024

@rchincha

I think we still need to resolve the question with guidelines of mixing and matching digests. Here is a sequence (really #494 (comment) rephrased)

  1. Customer one builds an image (say A) with sha256 and signs and pushes to a registry

  2. Customer two builds another image (say B) derived from A but with sha512 and push to a registry
    2.1 If the manifest is pushed as sha512, do we then convert everything to sha512?

Once the manifest is pushed to the registry, the registry cannot modify it. Doing so would break pinned references and violate the content addressable guarantee.

Before the push, it's a client decision of how they generate the manifest. I'd recommend against changing the digest of layers from a base image. Changing the digest of blobs implies foregoing cross-repository blob mounts, increases the number of copies of content on a registry and that need to be pulled to the client, and breaks tools that support rebasing images.

2.2 Or do we not? coz signatures and sucks for registries, and I would like to keep the ability to trace back image lineage etc.

This.

2.3 If registries federate and pull from other registries, do they do 2.1 or 2.2 above?

Mirrors and the likes should not be modifying the manifest. They need to retain the content addressable guarantee.

2.4 Are clients supposed to do some of the above instead?

Clients are free to do anything when building. If they modify content and change the digest, it's now their responsibility.

  1. Nice that we now support multiple digests, but what if one of them is broken, what is the path forward. A bad hash algo will bite us maybe 10 years from now.

What if the broken algorithm is sha256 and we don't have a way to push content with other digest algorithms? What if registries respond to a broken algorithm by changing their default, breaking the ability to sign images and other uses of referrers? Will clients have to be configured differently for each registry they talk to? My goal is to solve this so we aren't stuck in one of these bad scenarios.

We can lean on some lessons from other content-addressable projects: https://lwn.net/Articles/898522/

My concern (IMHO) is we have not set a high-level direction or rationale yet for above first.

My goal is just to make it possible to push content with other algorithms, focusing on the currently supported algorithms first. Once that is done, I think we have a path forward to add new algorithms (keeping in mind that content built with a new algorithm is not portable to older registries or usable by older downstream consumers).

I'm not focused on which algorithms we add in the future (that's a question for image-spec). And I think any security questions should get lumped into that discussion, which we can have later.

Would we leave #494 unresolved?

No, will resolve shortly after.

It can't be after. Without resolving #494 we'll never get to 2, because it's not possible to push a tagged manifest with anything other than the registry desired digest algorithm (hence the reason sudobmitch/demo:sha512 is not referenced with a sha512 digest itself). That also means it's not possible to push referrers to a tagged manifest unless the referrers happen to use that same algorithm that the registry defaults to.

@rchincha
Copy link
Contributor

Can we capture all this (rationale) somewhere?
"This is how OCI is thinking about this" ...

@andaaron
Copy link

@sudo-bmitch at least in "draft" state, maybe update zot to point to @andaaron 's tree/commit. andaaron/zot@6812b5c

@rchincha I need the changes for this:

distribution-spec/Makefile

Lines 95 to 105 in 11b8e3f

TEST_REGISTRY_CONTAINER ?= ghcr.io/project-zot/zot-minimal-linux-amd64:v2.0.0-rc6@sha256:bf95a94849cd9c6f596fb10e5a2d03b74267e7886d1ba0b3dab33337d9e46e5c
registry-ci:
docker rm -f oci-conformance && \
mkdir -p $(OUTPUT_DIRNAME) && \
echo '{"distSpecVersion":"1.1.0-dev","storage":{"rootDirectory":"/tmp/zot","gc":false,"dedupe":false},"http":{"address":"0.0.0.0","port":"5000"},"log":{"level":"debug"}}' > $(shell pwd)/$(OUTPUT_DIRNAME)/zot-config.json
docker run -d \
-v $(shell pwd)/$(OUTPUT_DIRNAME)/zot-config.json:/etc/zot/config.json \
--name=oci-conformance \
-p 5000:5000 \
$(TEST_REGISTRY_CONTAINER) && \
sleep 5

Do you have a container image to point to?

Hi @sudo-bmitch, can you try out ghcr.io/andaaron/zot-minimal-linux-amd64:v2.1.0-manifest-digest?

@sudo-bmitch
Copy link
Contributor Author

Can we capture all this (rationale) somewhere? "This is how OCI is thinking about this" ...

@rchincha I think most of this is covered by "The registry MUST store the manifest in the exact byte representation provided by the client" but feel free to open a PR and we'll discuss there.

@rchincha
Copy link
Contributor

Can we capture all this (rationale) somewhere? "This is how OCI is thinking about this" ...

@rchincha I think most of this is covered by "The registry MUST store the manifest in the exact byte representation provided by the client" but feel free to open a PR and we'll discuss there.

Let me take a stab at the text/content. Not sure where it should land yet. Maybe in the commit msg of this PR itself.
IMO, important that an uninformed reader understands the context of this change and also for our sake.

@sudo-bmitch sudo-bmitch force-pushed the pr-digest-on-tag-push branch from 811619c to 04c6343 Compare July 24, 2024 18:46
@rchincha
Copy link
Contributor

#547

@sudo-bmitch sudo-bmitch force-pushed the pr-digest-on-tag-push branch from 04c6343 to 26643ef Compare July 24, 2024 20:07
Signed-off-by: Brandon Mitchell <git@bmitch.net>
@sudo-bmitch sudo-bmitch force-pushed the pr-digest-on-tag-push branch from 26643ef to 063de3f Compare July 24, 2024 20:11
@sudo-bmitch
Copy link
Contributor Author

Hi @sudo-bmitch, can you try out ghcr.io/andaaron/zot-minimal-linux-amd64:v2.1.0-manifest-digest?

@andaaron thanks, I've got it working with that image (along with some other cleanups).

@sudo-bmitch
Copy link
Contributor Author

sudo-bmitch commented Jul 26, 2024

Something pointed out by @thaJeztah is this text from image-spec:

Implementations MAY implement SHA-512 digest verification for use in descriptors.

Without a MUST, that means this conformance test needs to be optional, and any content generated with a different digest should be considered non-portable, not just in practice, but also according to the spec.

@rchincha
Copy link
Contributor

Without a MUST, that means this conformance test needs to be optional

dist-spec v1.2.0?

@sudo-bmitch
Copy link
Contributor Author

Without a MUST, that means this conformance test needs to be optional

dist-spec v1.2.0?

It would have to be image-spec v1.2 changing the MAY to MUST language, semi-independent from distribution-spec. Clients would still need to be prepared for registries and other clients on older releases of each.

For distribution-spec, I think the conformance test really needs to be a matrix style of test for the different types of content running across the various APIs. And an early failure to push the content should skip the remaining tests. I think we should also reorganize for a single path through the APIs, push, query, pull, delete, instead of the setup/teardown per test we have now. I'd want to do that as a separate PR from this, and then make this PR add a new entry to the types of content to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify supported digest algorithms for manifests
7 participants