Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes to make local registry (unauthenticated) usable #1164

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wolfv
Copy link

@wolfv wolfv commented Oct 16, 2020

These are a couple of small fixes to make images of type localhost:32000/my-image work throughout binder.

With these fixes I was able to set up a microk8s single-machine cluster that serves binder.

@welcome
Copy link

welcome bot commented Oct 16, 2020

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@wolfv
Copy link
Author

wolfv commented Oct 16, 2020

hmm i am just realizing that this might break docker images that are like org/some-docker... maybe I should check that the splitted prefix matches the registry location.

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating this! I've no objection to the changes here, but I can't quite see how this is affecting the results. When I test it with sample images, the registry host is stripped off by truncating images. What are the values of the image string where this doesn't do the right thing? I tried with localhost:32000/repo/image:tag and it all seems right already.

@@ -291,7 +291,7 @@ async def get(self, provider_prefix, _unescaped_spec):
if self.settings['use_registry']:
for _ in range(3):
try:
image_manifest = await self.registry.get_image_manifest(*'/'.join(image_name.split('/')[-2:]).split(':', 1))
image_manifest = await self.registry.get_image_manifest(*'/'.join(image_name.split('/')[-2:]).rsplit(':', 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this change doesn't affect the result since image_name.split("/")[-2:] will drop the hostname, so this is called without the host:

(py3) In [14]: for image_name in ["gcr.io/repo/name", "repo/name", "repo/name:tag", "localhost:123/repo/name", "localhost:123/repo/name:tag"]:
          ...:     image_tag = '/'.join(image_name.split('/')[-2:]).split(':', 1)
          ...:     print(f"{image_name} -> {image_tag}")
          ...:     assert '/'.join(image_name.split('/')[-2:]).split(':', 1) == '/'.join(image_name.split('/')[-2:]).rsplit(':', 1)
          ...:
gcr.io/repo/name -> ['repo/name']
repo/name -> ['repo/name']
repo/name:tag -> ['repo/name', 'tag']
localhost:123/repo/name -> ['repo/name']
localhost:123/repo/name:tag -> ['repo/name', 'tag']

@@ -188,6 +188,10 @@ def _default_password(self):
@gen.coroutine
def get_image_manifest(self, image, tag):
client = httpclient.AsyncHTTPClient()
if '/' in image:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, this / is never in image, since it is split off of the image. Can you tell where it's being called with the host?

@minrk
Copy link
Member

minrk commented Oct 30, 2020

Ah, on re-reading more carefully, I realize that a local registry can have top-level images (i.e. localhost:3200/image:tag), and our code otherwise assumes that all images have the form repo/image:tag so there's exactly one slash in the repository name itself. Does it work if you do it that way, e.g. imagePrefix: "localhost:3200/binder/r2d-" without any other changes?

I think it would probably be good to actually parse this info rather than our increasingly nested sequence of image splits.

What I don't understand, though, is how docker determines if a tag of the form a/b is actually docker.io/a/b or http[s]://a/b. Does it have special handling of localhost and otherwise check for . in the first part? From the docker spec for "Repository":

Repository
A collection of tags grouped under a common prefix (the name component before :). For example, in an image tagged with the name my-app:3.1.4, my-app is the Repository component of the name. A repository name is made up of slash-separated name components, optionally prefixed by a DNS hostname. The hostname must comply with standard DNS rules, but may not contain _ characters. If a hostname is present, it may optionally be followed by a port number in the format :8080. Name components may contain lowercase characters, digits, and separators. A separator is defined as a period, one or two underscores, or one or more dashes. A name component may not start or end with a separator.

I've tested with a local registry running on port 80 so I could exclude port info:

docker run --rm -d -p 127.0.0.1:80:5000 --name registry registry:2

and I was able to push images with 0-many / separators, so our code's assumption that there's exactly 1 isn't always valid:

docker tag registry:2 localhost/registry:2
docker push localhost/registry:2
docker tag registry:2 localhost/a/b/c/d/registry:2
docker push localhost/a/b/c/d/registry:2

Docker's implementation defines a domain prefix as containing . or : with a special handling of localhost only (other port- or tld-less hosts won't work):

# added 127.0.0.1 testhost to /etc/hosts
$ docker tag registry:2 testhost/registry:2
$ docker push testhost/registry:2
The push refers to repository [docker.io/testhost/registry]
...

So we can implement proper image string parsing to always return registry, repository, and tag:

def parse_image(image_name):
    """Parse a docker image string into (registry, repository, tag)"""
    # image_name could be "localhost:5000/foo:tag" or "gcr.io/project/subrepo/sub/sub/sub:tag" or "org/repo:tag" for implicit docker.io
    registry, _, repo_tag = image_name.partition("/")
    if registry == "localhost" or any(c in registry for c in (".", ":")):
        # registry host is specified
        pass
    else:
        # registry not given, use implicit default docker.io registry
        registry = "docker.io"
        repo_tag = image_name
    repository, _, tag = repo_tag.partition(":")
    return (registry, repository, tag)

That eliminates our assumptions about how many / characters are in the repository string. (I used partition here since it always returns the same number and type of items, unlike split which depends on the presence of the separator).

@adriendelsalle
Copy link
Contributor

Hi! @wolfv @minrk

I was running in the exact same situation.

Does it work if you do it that way, e.g. imagePrefix: "localhost:3200/binder/r2d-" without any other changes?

Yes! Using something like your-registry.io/<some-project-name>/<prefix>- as image_prefix is working fine, as stated in the docs.

The parse_image looks fine! Maybe we could also add some validation to image_prefix to be sure it's not prefixed by http[s]:// that would cause parse_image returning wrong information ?
The zero-to-binderhub doc should be updated accordingly.

I would be happy to help :)

@wolfv wolfv force-pushed the fix_local_registry branch from 2ed984f to 25e8317 Compare February 17, 2021 19:10
test image_prefix trait validation
add image name parser
simplify get_image_manifest method
extend get_image_manifest tests
update zero-to-binderhub doc, private registry section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants