Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show/Get-Info API Requirements #232

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

SteveLasker
Copy link
Contributor

We've captured the scenarios for Listing content in a registry

  • list of repositories
  • list of manifests
  • list of tags

The premise is the listing apis are paginated lists, but don't have details. However, that is an assumption that people want to tease out the list with the details on a specific object.

This PR captures the scenarios for getting info (show/get-info) APIs.

A discussion for whether these should be separate APIs or the details can/should be combined with the results returned from the List apis is also interesting.

Signed-off-by: Steve Lasker <stevenlasker@hotmail.com>
Signed-off-by: Steve Lasker <stevenlasker@hotmail.com>
Signed-off-by: Steve Lasker <stevenlasker@hotmail.com>
@jonjohnsonjr
Copy link
Contributor

I am really confused about why you want to check in a requirements doc to the spec?

@SteveLasker
Copy link
Contributor Author

How many times have you looked at some code and wondered, "why is this here, and what can I add without breaking it?"
Ultimately, it's intended to facilitate a conversation on the what we want to achieve so we can work on the how in the spec. Once the spec for these enhancements is flushed out, we could either delete these *requirements.md docs, or leave them as open PRs until the spec is complete, then close the PRs without merging. At this point, I'm just trying to get everyone on the same page for what and when.

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Feb 5, 2021

we propose

Who is we?

To foster consistency
enabling the community
To facilitate discussions, prioritization and phased implementations

Just as general feedback, I would omit stuff like this. It's really difficult to understand what you're actually proposing as requirements with all this fluff. Trying to read between the lines, it seems like what you're trying to say is something like:

"We need an API to read metadata for a given resource."

I think that's reasonable.

You should clarify how this is different from the existing APIs. Your scenario is that "A user can get info", but
it is possible already to get info on a specific tag or digest, that's just a GET request to the manifests handler. Similarly, for repositories, there's GET on the tags handler. Does it make sense to expose any of this as HTTP headers instead of creating a new API?

It seems that you're talking about metadata that is external to the CAS, which isn't super clear. There are annotations on manifests and descriptors, even labels within images -- you should spell out why we need a fourth metadata API that isn't satisfied by what exists, and what kind of metadata is appropriate for each place. Does it make sense to propagate artifact annotations or labels out into this new set of metadata? Or should registries avoid introspecting these artifacts and only surface registry-centric metadata here?

tl;dr, I think you need to do some due diligence for the current state of things before suggesting another API, given that labels, annotations, and HTTP headers already exist.

(I'm not necessarily opposed to adding something like this, if I'm understanding you correctly, but I'd like this PR to convince me that you have thought about this and weighed the cost of adding a fourth set of metadata.)

If we were to apply this to my proposal for listing APIs, it would make sense to me that you should be able to get an individual descriptor from the list of descriptors if you don't care about the entire set.

To borrow from my examples, something like this sounds like a great idea:

GET /v2/<name>/descriptors/sha256:7a47ccc3bbe8a451b500d2b53104868b46d60ee8f5b35a24b41a86077c650210
{
   "digest":"sha256:7a47ccc3bbe8a451b500d2b53104868b46d60ee8f5b35a24b41a86077c650210",
   "mediaType":"application/vnd.docker.distribution.manifest.v2+json",
   "size":2035,
   "tags":[
      "latest",
      "v1"
   ],
   "annotations":{
      "org.opencontainers.image.created":"1985-04-12T23:20:50.52Z"
   }
}

The same idea could apply to tags:

GET /v2/<name>/descriptors/v1
{
   "digest":"sha256:7a47ccc3bbe8a451b500d2b53104868b46d60ee8f5b35a24b41a86077c650210",
   "mediaType":"application/vnd.docker.distribution.manifest.v2+json",
   "size":2035,
   "tags":[
      "latest",
      "v1"
   ],
   "annotations":{
      "org.opencontainers.image.created":"1985-04-12T23:20:50.52Z"
   }
}

Note that this would return the same thing as when requested by digest. This seems like it would be a nice addition, and make the existing tag listing API comparatively anemic.

This also would make sense for repositories, if we added a RepositoryDescriptor or something to my proposal.

Data Returned

I'm going to assume that the first table is stuff you think we should add to the spec, and the second table is just a set of examples of things registries might want to add. If so, I'm strongly opposed to everything in this PR, except for manifestCount and tagCount, to which I am opposed just the normal amount.

repoName doesn't really make any sense. If you need to know the repository name to ask for info, this is basically just echoing your request back to you.

dateCreated and totalPullCount require a registry to store additional metadata that it does not store today, which might be impossible to know. You give "The date the repository was created" as a description, so would this only apply to repositories? What about images? There's a conceptual conflict with dateCreated as it pertains to images: an image has a creation timestamp in its config file; would this refer to that? Or are you saying you think "push time" for an image would be interesting to surface?

manifestCount and tagCount should be implementable by anyone, but some implementations would be inefficient, requiring a full scan. They could be useful for determining pagination parameters and such, though. I think I'd rather see these as headers on the response to a HEAD request against the tag listing, manifest listing, or repository listing APIs, not as a separate thing.

The auth stuff is very registry-specific, and I would not want to standardize them in any way without having a separate discussion about the authentication spec.

I think it would be reasonable for any of this stuff to be returned as annotations on descriptors, from the examples I gave above. I also think it would be reasonable for the distribution spec to define some standard annotations, but all the examples except for dateCreated (depending on what you mean) are... not great.

org
namespace
repository
registry

You seem to have invented your own vocabulary, and it's unclear what distinctions you're making here.

@SteveLasker
Copy link
Contributor Author

SteveLasker commented Feb 19, 2021

For the examples above, we're still debating potentially multiple valid designs, when we need to define the requirements to figure out which design meets our requirements:

I've added some additional context to the Listing API Requirements

manifestCount and tagCount should be implementable by anyone, but some implementations would be inefficient, requiring a full scan.

These are examples, and we can debate which would be included in the common elements, or optional. But, it also brings up a point that just because we have a defined "common element" value, doesn't mean all registries would need to return a value. What I'm hoping is we can define a common schema, so a client could get a well-known/common value from any registry that supports it. As opposed to having a big switch statement for each registry's location for the same value.

You seem to have invented your own vocabulary, and it's unclear what distinctions you're making here.

There have been many different ways to reference the various parts of the uri. To help with some clarity, I wrote this post: Registry Names, Namespaces, Images, Artifacts & Tags and we added info to the Artifact Definition & Terms

I'm strongly opposed to everything in this PR, except for manifestCount and tagCount, to which I am opposed just the normal amount.

Is there anything you're not opposed to? 😏 I'm looking for common ground so we can actually move these projects forward.

I'm also happy to consolidate this into the Listing API Requirements #229

@jonjohnsonjr
Copy link
Contributor

jonjohnsonjr commented Feb 19, 2021

Is there anything you're not opposed to? 😏

In terms of breaking changes, probably not.

For additive changes that introduce new concepts, the change need to be obviously good, leverage existing extension points, and easy to implement for everyone. We have a (very) finite number of innovation tokens to spend here, especially given that this is a protocol and not a product. It's easy to change things that are under your control, but really difficult to get an entire software ecosystem to come along with you.

For anything that's new, ideally you can just point to an RFC, standard, spec, or convention to describe how it should work. The OCI specs don't exist in a vacuum. They rely heavily on proven technology that has seen years of design, revision, and production use. I'm going to be very skeptical of any changes that introduce new technology, concepts, semantics, or abstractions, especially if they don't leverage the existing extension points within the spec.

I had a bit of fun and put together a graph of most of the stuff OCI relies on and how its own concepts are organized (mostly centered around image-spec):
image

Basically, any proposals that add new nodes to that graph are going to be really tough to justify, at least to me.

Any proposals that clarify existing semantics or borrow more from the RFCs and specs that we already rely on are much more palatable.

As you can see from the graph, the descriptor is basically the nexus of the image spec. It is our primitive of abstraction, and any interesting semantics should go through the descriptor (IMO).

I know this doesn't have much to do with this PR, but I wanted to try to communicate why it might seem like my purpose in life is to oppose any and all progress. It's not that I don't want to fix anything -- I really do -- it's just that I don't want to inflict any more bad abstractions onto the world, so I'm going to play devil's advocate for most proposals.

There are lots of problems that can be solved with the abstractions we have and by incorporating more of what we already rely on. For example, consider the zstd layer discussion.

The zstd changes were rushed through for the image spec without considering how they would affect everything else. Backwards compatibility concerns were hand-waved away as "clients should just ignore stuff they don't understand", which is fine, but that doesn't allow for any kind of smooth transition. We've hashed out a lot of this in opencontainers/image-spec#803 and never really came to a good solution, I think because everyone was trying to solve the problem at the wrong level of abstraction.

The image-spec is not the place for defining new media type suffixes or media types in general. As we discussed at length regarding your "artifacts" convention, it does not make sense for us (or you), to maintain a registry of media types. The IANA exists for exactly this reason, and the semantics around how media types work are very well defined in RFC 6838. OCI relies on media type semantics, so if you want a media type suffix to exist for zstd compression (+zstd), the correct thing to do is to add it to the IANA suffix registry. This hadn't happened, but facebook was more than happy to fix it when I asked. Similarly, we can't just invent new syntax for media types -- your misunderstanding of the media types grammar led to a lot of needless back and forth about what a media type meant. Again, it is not the responsibility of OCI to define how media types work. That has already been done.

Now, look at #235

This proposal would rely on existing semantics from RFC 7231 to enable registries and clients to negotiate different encodings. One of the possible, standard encodings is zstd. I believe that this is where the zstd problem ought to have been solved, not at the image-spec level. As I mention in this issue, we don't have to make any changes to the specs to take advantage of this, because it's just how HTTP works.

Perhaps we can clarify some things to make it easier for clients and registries to interoperate, but we definitely don't need to invent anything new to solve this problem. It's been solved since 2014, though really the parts we care about are from RFC 2616, from 1999. This is the kind of proposal I'm really happy to see and more than willing to merge, and I think the point of the Technical Oversight Board is to guide problem solving towards solutions like this.

Base automatically changed from master to main March 9, 2021 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants