Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define structure of signatures in OCI #400

Closed
3 tasks
stevvooe opened this issue Oct 19, 2016 · 20 comments
Closed
3 tasks

Define structure of signatures in OCI #400

stevvooe opened this issue Oct 19, 2016 · 20 comments
Milestone

Comments

@stevvooe
Copy link
Contributor

stevvooe commented Oct 19, 2016

For signatures to work and be compatible across implementations, we need to define two aspects:

  1. What is the scope of the statement being signed? Is it the manifest/list/config directly or do we include something with metadata, such as an annotated descriptor?
  2. Where are signing subsystems resolved and how are they structured? How do we balance "resolution" versus "abstraction" without sacrificing functionality?

Number 1 must come before number 2 or we risk a vertically integrated, incompatible mess.

To be clear, this will not be successful if this becomes a file-format discussion, as that won't solve the problem. We need to define the framework within which these formats can operate.

TL; DR We need to define an interface to the signing world.

Context: #22 (comment)

Actions:

  • Define potential signing targets for integrated signature systems
  • Decide on the scope of the statement provided by signing a target
  • Ensure that 1.0 version of specification doesn't limit possibilities
@stevvooe stevvooe added this to the v1.0.0 milestone Oct 19, 2016
@runcom
Copy link
Member

runcom commented Oct 19, 2016

/cc @mtrmac @aweiteka

@wking
Copy link
Contributor

wking commented Oct 20, 2016

On Wed, Oct 19, 2016 at 02:38:42PM -0700, Stephen Day wrote:

  1. What is the scope of the statement being signed? Is it the
    manifest/list/config directly or do we include something with
    metadata, such as an annotated descriptor?

    Number 1 must come before number 2 or we risk a vertically
    integrated, incompatible mess.

Can we just focus on (1) here then? These discussions are complicated
enough without knowingly biting off multiple pieces at once ;).

My personal position on the signing scope is that folks should be
signing an assertion (like the name-assertion object I float in #176).
Without an assertion like that, the signature says “I think this blob
is good for something” but it's not clear what you think it's good
for. With a name-assertion, it's clear that you mean “I think this
blob represents debian:7” (or whatever the name is that you're
asserting).

Where the name being asserted lives doesn't really matter 1. You
could put it:

a. In a generic named-assertion media type with its own blob (like
#176 suggests).
b. As a ‘name’ field in the the signed object (e.g. in
application/vnd.oci.image.manifest.list.v1+json,
application/vnd.oci.image.manifest.v1+json, and
application/vnd.oci.image.config.v1+json. This sounds like your
“the manifest/list/config directly”).
c. As a ‘name’ field in descriptor schema, naming the referenced
object (this sounds like your “annotated descriptor”.

The (small) differences between the approaches are:

  • Latency: (a) requires an extra blob retrieval which (b) and (c) do
    not. You can work around that with HTTP/2-style “you'll probably
    want this too” pushes, but still, it will be more work to make (a)
    as performant.
  • Composablility: (b) and (c) require existing handlers to be updated
    when a new assertion is added (e.g. you need to teach the
    application/vnd.oci.image.manifest.v1+json validator and unpacker
    about the new ‘name’ field). (a) requires a new handler per
    assertion type, but you don't need to adjust the existing handlers.
  • Flexibility: (b) limits nameable blobs to those who have a known way
    to extract the name from the blob itself. Both (a) and (c) allow
    you to name arbitrary blobs.

@stevvooe
Copy link
Contributor Author

@wking As was expected, my point is completely lost to you. Again, this is not a naming or format discussion. We need to choose the signing targets.

Please refrain from commenting on this issue further before consulting with the maintainers. We must keep this focused and in scope or no progress will be made.

@aecolley
Copy link

Intuitively, users who use signing are likely to refer to an image using a name, and are going to assume that the image retrieved is actually signed and named by one of the configured trusted signers. For this reason, I suggest that the signature has to cover the image name (including the ref tag) and one of the content-hashes. I think it's reasonable to make an exception for the ref tag latest.

In particular, I want to be sure that it isn't possible for an attacker to substitute the signed but old-and-vulnerable fooserver:1.0 image for the signed but new-and-secure fooserver:1.1 image that the user requested.

I'm saying nothing about how the image is discovered or how the name is resolved or structured, or where signers come from.

@wking
Copy link
Contributor

wking commented Oct 23, 2016

On Sat, Oct 22, 2016 at 10:07:59PM -0700, Adrian Colley wrote:

Intuitively, users who use signing are likely to refer to an image using a name, and are going to assume that the image retrieved is actually signed and named by one of the configured trusted signers. For this reason, I suggest that the signature has to cover the image name (including the ref tag) and one of the content-hashes.

I'm pretty sure we want to have the signatures and signed objects in CAS, since that makes them easy to mirror and distribute. #176 proposes structures for doing that, although you can keep the name information in other places too if you don't want a separate name-assertion blob. You don't want to sign the ref (because the ref is mutable data that lives outside of CAS), but a name-assertion is a lot like ref that has moved into CAS (it's just a name string and a blob descriptor, #176). An example procedure for using signed name-assertions to validate names on unpacking is here.

@wking
Copy link
Contributor

wking commented Oct 23, 2016

On Sat, Oct 22, 2016 at 10:07:59PM -0700, Adrian Colley wrote:

In particular, I want to be sure that it isn't possible for an attacker to substitute the signed but old-and-vulnerable fooserver:1.0 image for the signed but new-and-secure fooserver:1.1 image that the user requested.

IPFS addresses this by providing a number of validity schemes. I expect we'll want something similar here, but “valid forever” assertions (with no additional validation data like validity-range, time-to-live, or parent) seem like a good place to start. Once we agree on that, we can come back in a subsequent round and add more granular validity schemes.

@aweiteka
Copy link

The signature format that @mtrmac and others have implemented against is proposed as a specification here: https://github.com/containers/image/pull/59/files. Example:

{
    "critical": {
        "identity": {
            "docker-reference": "docker.io/library/busybox"
        },
        "image": {
            "docker-manifest-digest": "sha256:a59906e33509d14c036c8678d687bd4eec81ed7c4b8ce907b888c607f6a1e0e6"
        },
        "type": "atomic container signature"
    },
    "optional": {
        "creator": "atomic 0.1.0-dev",
        "timestamp": 1471035347
    }
}

Before quibbling about specific details (data structure, values, etc), does this make sense at a high level?

@aecolley
Copy link

@aweiteka Just two quibbles. The JSON you quoted describes itself as a signature; but it's actually a document which is signed, and the signature is stored elsewhere. The PR you linked to proposes encrypting the document as well as signing it, which makes no sense. Other than those, and ignoring the details of how the JSON might be integrated into OCI and how keys might be discovered, it makes sense to me.

@stevvooe
Copy link
Contributor Author

@aweiteka As I outlined in this issue, this cannot be a format discussion. If I'm understanding you're proposal, it seems like you want agreement on the scope of a signing statement.

Before we do this, the first agreement needs to be made on a potential signing target. This needs to be one of the existing formats, perhaps a manifest or manifest list. Then we can talk about the scope of the statement made by the signature.

I've separated out an action on this issue that can help to address the confusion.

@wking
Copy link
Contributor

wking commented Oct 24, 2016

On Mon, Oct 24, 2016 at 12:10:39PM -0700, Stephen Day wrote:

Before we do this, the first agreement needs to be made on a potential signing target. This needs to be one of the existing formats, perhaps a manifest or manifest list.

I think there are useful workflows around signing configs manifests, and manifest-lists (although there are pros and cons to each choice). I'm happy to go over my arguments again, but it might be more productive if folks who think there is a single reasonable choice pitch their case for that choice.

@aecolley
Copy link

Maybe it's a matter of terminology, but I don't see any difference between the target of a signature and the scope of the signature. Either way, it is the thing whose integrity is protected against undetected malicious change.

The manifest and manifest list are both fine targets, as is the descriptor: all of them contain a content-hash which can secure the rest of the download by a client. I don't think the image config will do because it doesn't have hashes for the compressed layers. Clients need the length of each layer file before downloading so they can't be attacked with large files to provoke memory exhaustion, something the descriptor spec protects against quite well.

There's obvious value in allowing different signers for different manifests referenced by the same manifest list. So that's my suggestion: manifests should be signed.

@wking
Copy link
Contributor

wking commented Oct 24, 2016

On Mon, Oct 24, 2016 at 01:42:37PM -0700, Adrian Colley wrote:

I don't think the image config will do because it doesn't have hashes for the compressed layers.

But it does have diffIDs, so the trade-off is potential gzip-bomb exposure vs. not having to re-sign after layer re-compression.

@wking
Copy link
Contributor

wking commented Oct 24, 2016

On Mon, Oct 24, 2016 at 01:42:37PM -0700, Adrian Colley wrote:

There's obvious value in allowing different signers for different manifests referenced by the same manifest list. So that's my suggestion: manifests should be signed.

And an unsigned manifest-list opens manifest-list consumers up to denial-of-service type attacks where a malicious intermediate puts bogus entries into the manifest-list for a given platform.

An attacker could also spoof some platform settings (platform.os.features, platform.variant, …) which are currently only exposed in the manifest list. Although I'd like to close this particular vulnerability by using a single platform structure (probably defined in runtime-spec) for all the places where we talk about platforms.

@stevvooe
Copy link
Contributor Author

@aecolley While the target infers scope, we already have hash-stable data structures that make a very nice target for signing. When deciding on the target, we are saying that a signature should include the hash of that target.

When we talk about scope, we are referring to the semantics of the statement. This includes a superset of what is asserted in the signing target. At minimum, we can make the strong statement that manifest h(X) is signed by key Y. We can increase scope by saying that S + h(X) is signed by key Y, where S are some statements about what is asserted by the holder of the key.

So, this discussion must first decide on what X refers too, the signing target (probably manifest and manifest-list), and the extent of S. We also need to decide how to store and resolve sig(S + h(X)), in addition to S + h(X).

@wking
Copy link
Contributor

wking commented Oct 25, 2016

On Mon, Oct 24, 2016 at 08:18:27PM -0700, Stephen Day wrote:

At minimum, we can make the strong statement that manifest h(X) is signed by key Y.

I think “blob X is signed by Y” is pretty useless unless it's clear what that signature is asserting. In the absence of an explicit assertion, I expect it is “everything in the Merkle tree rooted at X is perfect” which seems like a high bar to meet. There are a number of more restricted assertions we could focus on, and I think “the name of blob X is Z” is a reasonable starting point. Once we work out the name-assertion framework, we can come back and consider additional assertions (e.g. “has been audited for security with a report at URL Z”) in future issues.

So, this discussion must first decide on what X refers too, the signing target (probably manifest and manifest-list)…

Do you not see enough benefit in the layer-re-compression-agnostic config signatures to consider supporting signed name-assertions on them?

We also need to decide how to store and resolve sig(S + h(X)), in addition to S + h(X).

I suggest postponing this until we've settled on an initial assertion S (e.g. “the name of blob X is Z”) and a set of valid target types (e.g. “any type you like” or “just manifests and manifest-lists”).

@aecolley
Copy link

Manifest annotations can be used for any security assertions that we'd like the signature to represent. I'd prefer to be able to sign any element of the set { refs descriptor, manifest list, or manifest }; but it's sufficient for the spec to provide for a signature on only one of them (even the otherwise-optional manifest list). Signatures are likely to be associated with professional release engineering processes, so regenerating signatures for refs, manifest lists or manifests isn't going to be a problem one way or another.

It is least work (and least overhead) to sign refs; but it is maximum flexibility to sign manifests, because different keys can be used by different organizational units with responsibility for different architectures. Actually, who does that? The simplicity of signed refs is starting to appeal to me. 😐

As for signature storage, it's sufficient to use detached .asc files. This would require accommodation in the spec to permit the signature files to be present in the refs or blobs directories, and to forbid refs ending in ".asc". Alternatively, the signatures could be embedded in the JSON, which would mean reserving key names in the descriptor spec.

@wking
Copy link
Contributor

wking commented Oct 26, 2016

On Tue, Oct 25, 2016 at 05:05:17PM -0700, Adrian Colley wrote:

Signatures are likely to be associated with professional release
engineering processes, so regenerating signatures for refs, manifest
lists or manifests isn't going to be a problem one way or another.

This assumes that the original signers remain involved in ongoing
maintenance for their images. Some of the historical trouble with
signature support was when the image modeling changed. The folks
running the registry understandably want to make life easier for image
consumers by doing on-the-fly translation to the newer formats. But
you can't do that safely without re-signing the translated blobs,
unless the signature is low enough in the Merkle chain to duck under
the translated blobs. With signed name assertions and such on the
config whose diffIDs protect the validity of the unpacked image, the
registry maintainers and other intermediates are free to recompress
layers, reroll manifests, manifest-lists, etc. as they see fit.

Of course, config-level signatures mean that less-well-intentioned
intermediates can translate the unsigned blobs without detection as
well. So you get increased exposures to gzip-bombs, layer
misdirection, forged platform entries in the manifest-list, etc., etc.
Where each publisher and consumer comes down on the tradeoffs is a
policy descision that should be left up to them.

It is least work (and least overhead) to sign refs…

You can't actually do that, because you need a serialized byte-stream
to sign, and refs are a ref-name ↔ descriptor pair without an
unambiguous serialization. If you wanted to sign refs, you'd need to
invent a serialization for them. If you pick JSON, store the ref name
in a ‘name’ property, and store the descriptor in a ‘blob’ property,
you end up with my name-assertion type (#176). If you pick a slightly
different schema, keep only the digest hash, and add properties for
optional metadata, you end up with @aweiteka's format 2. And
@stevvooe wants to keep this issue from descending to that level of
detail anyway 3. If we reach a consensus in this issue that we do
want to define a name-assertion schema, then we can open a new issue
or PR to hash out the best schema for that task. I have yet to hear
anyone arguing against a name-assertion schema though, maybe we
already have a consensus around wanting one?

As for signature storage, it's sufficient to use detached .asc
files. This would require accommodation in the spec to permit the
signature files to be present in the refs or blobs directories…

Signatures can already live in image-spec's blobs directory, which
will happily store any opaque byte stream you throw at it 4.

… and to forbid refs ending in ".asc".

I think we should punt on signature discovery until we pick at least
one signable assertion structure ;). Maybe that's
“application/vnd.oci.image.manifest.v1+json is a signable assertion,
and signing it means…”. And maybe that's
“application/vnd.oci.image.named.blob.v1+json is a signable assertion,
and signing it means that the name in ‘name’ applies to the blob
referenced by ‘blob’”. But signature discovery seems orthogonal to
picking signable assertions, and picking signable assertions has
proven difficult enough on its own ;).

 Subject: Signature verification after image-format translation
 Date: Fri, 15 Apr 2016 14:50:18 -0700
 Message-ID: <20160415215018.GR23066@odin.tremily.us>

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#400 (comment)

This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

@wking
Copy link
Contributor

wking commented Nov 4, 2016

On Wed, Oct 26, 2016 at 08:47:00AM -0700, W. Trevor King wrote:

If we reach a consensus in this issue that we do want to define a
name-assertion schema, then we can open a new issue or PR to hash
out the best schema for that task. I have yet to hear anyone
arguing against a name-assertion schema though, maybe we already
have a consensus around wanting one?

More than a week with no “I don't think we have a consensus yet”, so
I've filed my take on name assertions in #445.

@stevvooe
Copy link
Contributor Author

More than a week with no “I don't think we have a consensus yet”, so
I've filed my take on name assertions in #445.

Again, you've hijacked the conversation with nonsense. I've closed #445.

Please refrain from commenting unless you have something valuable to add to the conversation.

@vbatts
Copy link
Member

vbatts commented Apr 2, 2021

Good history here, but any signature format discussions that are on going can open new issues as needed

@vbatts vbatts closed this as completed Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants