Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External Container Reg creds not always available #791

Closed
tylerpotts opened this issue Aug 24, 2021 · 5 comments · Fixed by #945
Closed

External Container Reg creds not always available #791

tylerpotts opened this issue Aug 24, 2021 · 5 comments · Fixed by #945
Assignees
Labels
type: bug 🐛 Something isn't working

Comments

@tylerpotts
Copy link
Contributor

Initial job to get aws docker login details needs to run before jupyterhub starts, otherwise image puller can get stuck.

@github-actions
Copy link

This issue has been automatically marked as stale because there was no recent activity in 60 days. Remove the stale label or add a comment, otherwise, this issue will automatically be closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the status: stale 🥖 Not up to date with the default branch - needs update label Oct 25, 2021
@danlester danlester removed the status: stale 🥖 Not up to date with the default branch - needs update label Oct 25, 2021
@iameskild
Copy link
Member

iameskild commented Nov 17, 2021

@danlester @tylerpotts would either of you mind adding a little more detail to this issue? Thank you :)

@danlester
Copy link
Contributor

Background

By default, images such as the default JupyterLab image specified as quansight/qhub-jupyterhub:v0.3.13 will be pulled from Docker Hub.

To specify a private AWS ECR (and this technique should work regardless of which cloud your QHub is deployed to), first provide details of the ECR and AWS access keys in qhub-config.yaml:

external_container_reg:
  enabled: true
  access_key_id: <AWS access key id>
  secret_access_key: <AWS secret key>
  extcr_account: 12345678
  extcr_region: us-west-1

This will mean you can specify private Docker images such as 12345678.dkr.ecr.us-west-1.amazonaws.com/quansight/qhub-jupyterlab:mytag in your qhub-config.yaml file. The AWS key and secret provided must have relevant ecr IAMS permissions to authenticate and read from the ECR container registry.

Detail for this Issue

When deploying QHub with an external_container_reg set and one or more images referring to that ECR, the deployment may fail initially because pulling doesn't work before the credentials specified have been properly processed and set in a Kubernetes secret.

This happens in a cronjob (so it makes sense this might not have run yet) and also in a Job (named job_extcr_cred_updater in the Terraform code). One would hope that the Job could run early enough for the pull to be able to use those processed credentials (a Docker-login auth token of some sort for ECR is generated by the Job/CronJob by processing the credentials provided in the YAML file).

To solve this issue, we would first need to understand why the Job isn't running soon enough (or is it working at all - I believe it is, it's the same as the CronJob version really), or maybe the pods that depend on the auth token need to be created after the job has run. Is there a way to allow an existing pod to notice an updated (or, in this case, newly-generated) Docker-login auth token?

Severity

This problem is only seen the first time an ECR is set up in QHub, and it resolves itself eventually, so I don't think it is a major obstruction at the moment (unless people other than me are using ECR...).

So I think it's one for the backlog at the moment rather than the next milestone.

Hope that helps - I'll let you tag appropriately @iameskild since it looks like you were on a roll :)

@iameskild
Copy link
Member

Thanks a bunch for all the detail @danlester! I agree this seems like something for the backlog. I think in the future a feature like this would come in handy especially if we are able to allow users to "build" or "customize" their own QHub docker images like mentioned in issue #785 / #715

@danlester danlester self-assigned this Nov 26, 2021
@danlester
Copy link
Contributor

After some more investigation:

The problem is more severe than thought because some images (e.g. jupyterhub image) will never pull if a private container registry is specified. A private jupyterlab image normally pulls fine.

Extensions (e.g. ent-qhub-control-panel) don't pull until you kill the pod so it starts again.

Part of the issue is timing - the credentials aren't set early enough. But another problem is that the credentials are only patched to the default service account. So if another service account is used for the deployment then the credentials will never be available.

The solution is:

  1. Move the credentials to be specified earlier in the qhub deployment process
  2. On helm charts, specify extcrcreds as an explicit imagePullSecret if the Helm chart uses a non-default service account. (An alternative could be to tell it to use the default service account instead.)

@danlester danlester added type: bug 🐛 Something isn't working and removed needs more info labels Nov 26, 2021
@danlester danlester added this to the Release v0.4.0 milestone Nov 26, 2021
@danlester danlester changed the title External Container Reg login job External Container Reg creds not always available Nov 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug 🐛 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants