Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref-engine-discovery: Add a ref-engine discovery protocol #2

Merged
merged 5 commits into from
Sep 9, 2017

Conversation

wking
Copy link
Contributor

@wking wking commented Sep 6, 2017

This is mostly from this comment, but pulls in ideas from the rest of the discussion with @xiekeyang and @cyphar there, as well as the inspirational specs mentioned in the README. I haven't spent much time on casEngines in this PR, because I consider that orthogonal to ref-engine discovery. Helpful ref engines may populate casEngines in their Merkle roots without breaking the deeper Merkle tree.

This PR will hopefully make it easier to track my proposal as I continue to pivot in response to review.

@wking wking force-pushed the ref-engine-discovery branch 8 times, most recently from 8c18e71 to 37069dd Compare September 6, 2017 22:02
@wking
Copy link
Contributor Author

wking commented Sep 6, 2017

I've pushed 8c18e7137069dd, spinning the DNS-compatible image name bit out into it's own micro-spec. This information is more generic than my ref-engine discovery protocol (where it had lived before), and is very similar to (and inspired by) @cyphar's distribution-uri ABNF. Differences from his ABNF:

  • I've uses host instead of authority before the slash. Where they're needed, I expect users to be providing userinfo and port information via other channels, especially since docker etc. currently use : to delimit tags.
  • I've dropped segment-nz because the “default authority” option is not portable. I've mentioned it as a potential extension.
  • I've added an optional fragment, which, depending on how folks process the index, may allow you to use the same index URI template for multiple image names (somewhat like Docker's tags).

@wking
Copy link
Contributor Author

wking commented Sep 7, 2017

I've pushed 37069dd8d7eb73 adding a CAS protocol registry, oci-cas-template-v1, and the ability to optionally set casEngines in the ref-engine discovery response (as initially floated here). With this change, the current ref-engine discovery response is very similar to @cyphar's distribution object with its indexuris and bloburis, with the main differences being:

  • casEngines is optional. I expect it to be set most frequently in ref-engine responses, and only occasionally in ref-engine-discovery responses, but you don't have to set it anywhere if you don't want. @cyphar's bloburis is required.
  • I don't have @cyphar's discovery object indirection, although you could add something like that to my suite of protocols if you want to support folks with dumb well-known ref-engine discovery servers, dumb oci-index-template-v1 ref-engine servers, and unstable CAS engine providers.


```ABNF
dns-compatible-image-name = host "/" path-rootless [ "#" fragment ]
```
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conforms to https://tools.ietf.org/html/rfc3986#section-3, that seems to be compatible to both DNS and non-DNS case. The scenario without DNS is not mentioned in your PR, but I think it can also be supported by our discussing protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conforms to https://tools.ietf.org/html/rfc3986#section-3, that seems to be compatible to both DNS and non-DNS case.

I agree that these names could be resolved outside of DNS (e.g. via an /etc/hosts file). I've pushed 8d7eb73292435d squashing some fixups that change dns-compatible-image-name to host-based-image-name, mention IP address forms, and SHOULD DNS-comatible host values (because otherwise using X.509 is hard).


```
$ curl -H 'Accept: application/vnd.oci.image.index.v1+json' https://a.b.example.com/ref/a.b.example.com/c/d
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pure approach how to retrieve blob from image HUB. This document need only protocol, and present a object (descriptor + CAS Engine). If this need to be involved in document? IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pure approach how to retrieve blob from image HUB.

I thought a full example would be informative. The normative portion of the index template protocol spec all happens above the “Example” header.

This document need only protocol, and present a object (descriptor + CAS Engine).

Do you mean “why is casEngines missing from the response?”? I've added it with 292435d2bb26ec.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pure approach how to retrieve blob from image HUB... If this need to be involved in document?

I thought a full example would be informative. The normative portion of the index template protocol spec all happens above the “Example” header.

I mean:

  1. the object retrieved is not application/vnd.oci.image.index.v1+json type, it should be individual type created by discovery implementation.
  2. Example just paste the object content, needn't past curl command. The approaches are too widely.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document need only protocol, and present a object (descriptor + CAS Engine).

Do you mean “why is casEngines missing from the response?”? I've added it with 292435d2bb26ec.

Yes, this should be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the object retrieved is not application/vnd.oci.image.index.v1+json type, it should be individual type created by discovery implementation.

But it is an application/vnd.oci.image.index.v1+json; with the casEngines proposal we'll update image-spec to add the new descriptor property. And the point of the index template protocol is that consumers can recycle their index.json tooling, so we want the same type.

  1. Example just paste the object content, needn't past curl command. The approaches are too widely.

Folks can certainly fetch this with tools other than cURL. But I think it's a simple way to demonstrate fetching the index object from the expanded URI with content negotiation.

@wking wking force-pushed the ref-engine-discovery branch 2 times, most recently from 292435d to 2bb26ec Compare September 7, 2017 04:19
@xiekeyang
Copy link
Owner

I'm drawing a commit 1 (not finish and not put PR yet) on branch discovery It do similar thing as this PR. But it doesn't matter, we just hope to present a good specification.

I had thought that we should build 2 well known URIs for retrieve ref-engine object, and cas-engine object. See 2.

https://example.com/.well-known/oci-index/reference?...

and

https://example.com/.well-known/oci-index/descriptor?...

Because I think consumers want to them for different purpose.


The server providing the expanded URI MUST support requests for media type [`application/vnd.oci.image.index.v1+json`][index].
Servers MAY support other media types using HTTP content negotiation, as described in [RFC 7231 section 3.4][rfc7231-s3.4] (which is [also supported over HTTP/2][rfc7540-s8]).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the object data SHOULD be built and stored in discovery service permanently, while discovery service will synchronize the object content. So Consumers retrieve this object from discovery backend directly, that approach I feel is more like what we discussed before.

Your language Consumers retrieving application... seems to be the workflow:

  1. client request to discovery URI;
  2. discovery service retrieve newest descriptor from distribution URI;
  3. discovery service package descriptor and casEngines insidely;
  4. discovery service returns the packaged object to client;

Copy link
Contributor Author

@wking wking Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the object data SHOULD be built and stored in discovery service permanently, while discovery service will synchronize the object content. So Consumers retrieve this object from discovery backend directly, that approach I feel is more like what we discussed before.

Your language Consumers retrieving application... seems to be the workflow:

  1. client request to discovery URI;
  2. discovery service retrieve newest descriptor from distribution URI;
  3. discovery service package descriptor and casEngines insidely;
  4. discovery service returns the packaged object to client;

I'd use “ref engine” where you use “discovery service”, but yeah, having a proxy ref engine that resolves a name via a second ref engine and injects casEngines entries is possible (and using such a proxy is one way around the “both my ref-engine discovery and terminal ref-engine service are dumb” issue). But with casEngines added to image-spec as a descriptor property, folks who wanted to could certainly set it statically in the application/vnd.oci.image.index.v1+json, so you can have a completely static ref-engine service if you wanted.


```
$ curl -H 'Accept: application/vnd.oci.image.index.v1+json' https://a.b.example.com/ref/a.b.example.com/c/d
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pure approach how to retrieve blob from image HUB... If this need to be involved in document?

I thought a full example would be informative. The normative portion of the index template protocol spec all happens above the “Example” header.

I mean:

  1. the object retrieved is not application/vnd.oci.image.index.v1+json type, it should be individual type created by discovery implementation.
  2. Example just paste the object content, needn't past curl command. The approaches are too widely.


```
$ curl -H 'Accept: application/vnd.oci.image.index.v1+json' https://a.b.example.com/ref/a.b.example.com/c/d
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document need only protocol, and present a object (descriptor + CAS Engine).

Do you mean “why is casEngines missing from the response?”? I've added it with 292435d2bb26ec.

Yes, this should be added.

wking referenced this pull request Sep 7, 2017
@wking
Copy link
Contributor Author

wking commented Sep 7, 2017

I had thought that we should build 2 well known URIs for retrieve ref-engine object, and cas-engine object.

I've spun this off into this comment.

@wking wking force-pushed the ref-engine-discovery branch from 2bb26ec to f815358 Compare September 7, 2017 16:36
@wking
Copy link
Contributor Author

wking commented Sep 7, 2017

I've pushed 2bb26ecf815358, changing .well-known/oci/ref-engines to .well-known/oci-ref-engines so we can register the ref-engine discovery spec independently from other OCI .well-known specs. Based on this comment.

@wking wking force-pushed the ref-engine-discovery branch from f815358 to 595142e Compare September 7, 2017 17:58
@wking
Copy link
Contributor Author

wking commented Sep 7, 2017

I've pushed f815358906725d fixing a path-rootlesspath typo and adding two examples of serving everything from a single static Nginx server.

@wking wking force-pushed the ref-engine-discovery branch 4 times, most recently from 3f8a8cd to d694d96 Compare September 7, 2017 18:08
* `digest`, matching `digest` in the [`digest` rule][digest].
* `algorithm`, matching `algorithm` in the `digest` rule.
* `encoded`, matching `encoded` in the `digest` rule.

Copy link
Owner

@xiekeyang xiekeyang Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these variables SHOULD be restricted in spec. E.g. a CAS URI:

https://example.com/oci-cas/ubuntu/14.04

Which returns pure CAS object (OCI index media type), but the URI path doesn't conform to what you defined. You want to reject it?
And I really se no reason to restrict it. This URI is provided by image provider, they can use any template they like.
You restrict these, if you want to parse them from URI? But how to parse?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can allow consumers to define URI with unlimited template. If I miss something?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these variables SHOULD be restricted in spec. E.g. a CAS URI:

https://example.com/oci-cas/ubuntu/14.04

Which returns pure CAS object (OCI index media type), but the URI path doesn't conform to what you defined. You want to reject it?
And I really se no reason to restrict it. This URI is provided by image provider, they can use any template they like.
You restrict these, if you want to parse them from URI? But how to parse?
I feel we can allow consumers to define URI with unlimited template. If I miss something?

Oh, I really missed something. You define oci-cas-template-v1 for REF and CAS protocol, and allow consumers to extend the favorite protocols. Ignore my above questions, they make no sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You define oci-cas-template-v1 for REF and CAS protocol...

nit: oci-cas-template-v1 is just for CAS. oci-index-template-v1 is for refs.

@xiekeyang
Copy link
Owner

xiekeyang commented Sep 8, 2017

@wking I think you can add some explain about goal of ref engines. It is for retrieving image name's and basic information's URI , with mutable content. And clarify that in most cases refEngine has no relationship to casEngine. This language can make readers clearer, IMO.

For image names which are supported for [`host-based-image-name`](host-based-image-names.md), a ref-engine discovery object MAY be retrieved from the following [well-known URI][well-known]:

https://{host}/.well-known/oci-ref-engines

Copy link
Owner

@xiekeyang xiekeyang Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you means /.well-known/oci-ref-engines is canonical path. Should add the language like? :

Where /.well-known/oci-ref-engines is canonical endpoint. ref-engine will handle discovery request under only this endpoint. (my language is rough)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, how to sepcify the discovery target by consumer? e.g. consumer want to specific image by name, os, arch...

We might should define some variables, which looks like:

https://a.b.example.com/.well-known/oci-ref-engines?name=ubuntu&os=linux&arch=amd64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://a.b.example.com/.well-known/oci-ref-engines?name=ubuntu&os=linux&arch=and64

I don't think those belong in the ref-engine discovery URI, which is for "these services know how to resolve my names". That's unlikely to be OS- or arch-dependent. For example, I would be surprised if someone pushed all their Windows refs to one oci-index-template-v1 ref engine but pushed all of their Linux refs to one docker ref engine. And even if they did, they could get less efficient support by listing both ref engines without OS/arch qualifiers.

I think there might be a stronger case for including os and arch in the oci-index-template-v1 variables, so you could use separate index JSON for different arch and/or OS. But I think the current variables give you enough granularity to keep the index JSON small, and with opencontainers/runtime-spec#405 still up in the air, I'm not sure if images will continue to have a single, well-defined OS (although I think the chances are better for images than for runtime configs). So for the moment, I'd rather leave OS/arch choices in oci-index-template-v1 up to the usual index JSON handler. Other ref-engine protocols are, of course, free to use that information however they see fit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you means /.well-known/oci-ref-engines is canonical path.

Yeah, that definition and the adjacent bit about walking the DNS ancestors were weak. I've taken a stab at tightening them up in d694d96f56204e, which also renames oci-ref-engines to oci-host-ref-engines to emphasize the host-association now covered here.

@wking wking force-pushed the ref-engine-discovery branch 2 times, most recently from 406ee2e to f56204e Compare September 8, 2017 17:53
@wking
Copy link
Contributor Author

wking commented Sep 8, 2017

I think you can add some explain about goal of ref engines. It is for retrieving image name's and basic information's URI , with mutable content.

There's something like that in the glossary entry. I looked that over again, but cannot thing of anything more to add. Maybe you can recommend some wording?

And clarify that in most cases refEngine has no relationship to casEngine.

There's something like that in the out-of-scope line (which is now the final line in the Merkle root glossary entry). But while the Merkle root's opinion on casEngines is independent of ref-engine discovery, casEngines itself is no longer independent (since b251fe1 which allows a casEngines entry in the ref-engines object).

This language can make readers clearer, IMO.

Language suggestions welcome :).

This is mostly from [1], but pulls in ideas from the rest of the
discussion with Keyang and Aleksa there, as well as the inspirational
specs mentioned in the README.

Use a single segment-nz (no slashes) after .well-known/ to comply with
[2]:

  Registered names MUST conform to the segment-nz production in
  [RFC3986].

It also has:

  It MAY also contain additional information, such as the syntax of
  additional path components, query strings and/or fragment
  identifiers to be appended to the well-known URI...

but using a single segment-nz for this protocol allows us to register
it independently of other OCI .well-known URIs if we gain such in the
future.

The well-known URI registration follows the template from [3], with
inspiration taken from the examples in [4,5,6,7,8].

[1]: xiekeyang#1 (comment)
[2]: https://tools.ietf.org/html/rfc5785#section-3
[3]: https://tools.ietf.org/html/rfc5785#section-5.1.1
[4]: https://tools.ietf.org/html/rfc6415#section-6.1
[5]: https://tools.ietf.org/html/rfc6690#section-7.1
[6]: https://tools.ietf.org/html/rfc6764#section-9.1.1
[7]: https://tools.ietf.org/html/rfc6940#section-14.1
[8]: https://tools.ietf.org/html/rfc7033#section-10.1
And allow casEngines in the ref-engine discovery response, supporting
folks who want to defer to a ref-engine that is hard/impossible to
update (e.g. index JSON on someone else's server for a
oci-index-template-v1 ref engine) but who still want to provide their
own CAS engine hints.
To collect an example of the whole process in one place, for folks who
want to get off the ground quickly with a basic solution.
For folks who want to diverge as little as possible from things
already in image-spec.  Downsides to this approach include:

* Non-sharded blobs [1], although it's not clear to me that modern
  filesystems suffer from having many entries in one directory [2].

* Possible duplicate blobs between two layouts.  You can address this
  with symlinks or similar, but you'd need extra tooling to do that.
  With a single CAS bucket, there's only one place that the blob could
  be, so deduping is free (but garbage collection becomes more
  complicated).

[1]: opencontainers/image-spec#449
[2]: opencontainers/image-spec#94 (comment)
@wking wking force-pushed the ref-engine-discovery branch 2 times, most recently from c74c011 to b733918 Compare September 8, 2017 23:45
@wking
Copy link
Contributor Author

wking commented Sep 8, 2017

I've pushed f56204e89e3eb5 with some header tweaks, a list of specs and registries in the README, and a Python implementation of everything except the CAS engines.

This covers everything except the CAS engines.
@wking wking force-pushed the ref-engine-discovery branch from b733918 to 89e3eb5 Compare September 8, 2017 23:50
## Images associated with a host's `oci-host-ref-engines`

Publishers SHOULD populate the `oci-host-ref-engines` resource with ref engines which are capable of resolving image names that match the [`host-based-image-name` rule](host-based-image-names.md) with a `host` part that matching their [fully qualified domain name][rfc1594-s5.2] and its subdomains or deeper descendants.
For example, https://b.example.com/.well-known/oci-host-ref-engines SHOULD prefer ref engines capable of resolving image names with `host` parts matching `b.example.com`, `a.b.example.com`, etc.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: https://b.example.com/.well-known/oci-host-ref-engines should add ' ', otherwise it will be super link unable to visit.

@xiekeyang
Copy link
Owner

LGTM 🥇

It has reached almost what we discussed before, and can be merged already. In future we can go on improve and extend the capabilities. And one question: why oci-discovery implementation 1 select python but not golang? If anyone like and want to call API, maybe golang is more popular in OCI ecosystem. Of course the implementation 1 is amazing, and we can add golang later if necessary.

@xiekeyang xiekeyang merged commit a505058 into xiekeyang:master Sep 9, 2017
@wking
Copy link
Contributor Author

wking commented Sep 9, 2017

And one question: why oci-discovery implementation 1 select python but not golang?

I write Python much faster than Go ;). Agreed that a Go implementation would be good too.

@wking wking deleted the ref-engine-discovery branch September 11, 2017 04:45
wking added a commit to wking/oci-discovery that referenced this pull request Sep 11, 2017
I expect the code I wrote only works on Python 3, and PEP 394 suggests
python3 for that [1].  I got the Makefile entries right in 89e3eb5
(oci_discovery: Add a Python 3 implementation, 2017-09-08, xiekeyang#2), but
botched this one.

[1]: https://www.python.org/dev/peps/pep-0394/#recommendation
wking added a commit to wking/oci-discovery that referenced this pull request Sep 13, 2017
These slipped in as a copy/paste error in d153511 (cas-template:
Define and register the oci-cas-template-v1 protocol, 2017-09-06, xiekeyang#2).
wking added a commit to wking/oci-discovery that referenced this pull request Sep 13, 2017
This slipped in with d153511 (cas-template: Define and register the
oci-cas-template-v1 protocol, 2017-09-06, xiekeyang#2).  I've tried to
consistently use "CAS engines" for nouns and "CAS-engines" for
adjectives.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants