Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom configs for gateway clusters #481

Merged
merged 20 commits into from
Jan 9, 2020
Merged

Custom configs for gateway clusters #481

merged 20 commits into from
Jan 9, 2020

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Nov 27, 2019

I'd like to use this branch to collect the initial customizations for the dask-gateway clusters that went live yesterday. A few things to check:

  • gateway clusters use image: ${JUPYTER_IMAGE_SPEC}
  • setup options for cluster specs (memory, cpu, etc.)
  • make sure gateway ips are static on GCP
  • https for endpoints
  • figure out how to make gateway clusters work with dask-labextension
  • set defaults for autoscaling and automatic cluster shutdown
  • set user limits for cpu/memory/cluster
  • make sure dask clusters are putting pods on correct nodes (check taints/affinities/etc)

cc @TomAugspurger

@TomAugspurger
Copy link
Member

figure out how to make gateway clusters work with dask-labextension

I think this is fixed on dask-labextension master, and Jacob is planning to do a release today dask/dask-labextension#93

@jhamman jhamman mentioned this pull request Dec 5, 2019
@@ -43,3 +43,9 @@ labextension:
class: KubeCluster
args: []
kwargs: {}

gateway:
address: http://34.68.195.134
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document somewhere how we ensure that these addresses are stable?

And, I may be wrong, but this feels like something that dask-gateway / kubernetes should be able to do for us. I'm reading through https://kubernetes.io/docs/concepts/services-networking/service/#dns.

This Gateway is in the same kubernetes deployement, so would gateway-api-dev-staging-dask-gateway.dev-staging find it? cc @jcrist if you have thoughts here.

In general though, we won't be able to rely on that I think. That would only potentially work when the Client is running on a machine within the same kubernetes deployement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe this is working?

g = Gateway(
    address="http://web-public-dev-staging-dask-gateway",
    proxy_address="tls://scheduler-public-dev-staging-dask-gateway",
    auth="jupyterhub"
)

g.list_clusters()
[]

Is that just a coincidence? I'm not sure why the .namespace of dev-staging isn't needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @jcrist on the k8s dns functionality. If we want the public ips to be stable, we need to reserve them with GCP: https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address

Is it worth mapping these to pangeo dns entries (e.g. gateway.staging.hub.pangeo.io)? Would this help make these more usable outside this jhub?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're connecting to the gateway from inside the same k8s cluster, then:

gateway:
  address: https://web-public-{name}
  proxy-address: https://scheduler-public-{name}
  auth:
    type: jupyterhub

will work, where name is the deployment name. I think if it's deployed in the default namespace then .{namespace} isn't part of the name.

If you're trying to connect from outside the cluster, then you'll want to expose a stable address to external clients. I'm not sure the best way to handle that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think using the k8s dns stuff to just use the names rather than IPs makes sense.

In the future I suspect we'll have multiple gateways, some of which will be deployed to different k8s clusters. In that case we'll need to use the cloud provider's service for stable IPs / get a DNS entry.

@jcrist does having some config / API for discovering multiple gateways make sense? This would be some kind of global name: c.Gateway mapping. Someday we may want users connecting to this hub to be able to connect to a Gateway deployed on AWS, or perhaps GCP but in a different region. I haven't thought about it much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does having some config / API for discovering multiple gateways make sense?

Hmmm, maybe? I'd have to think about what the user-facing api would look like.

For now I agree that you should focus on in-cluster users.

Copy link
Member

@jcrist jcrist Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we could do is add dask-gateway behind the Jupyterhub proxy as a service (JupyterHub services may be proxied, but aren't required to be). Then the web api would be available at https://jupyterhub-address/services/dask-gateway/. This simplifies the gateway.address field at least. Then you'd only need special handling for the scheduler-proxy. If you only want users internal to the cluster, you could make the scheduler-proxy a ClusterIP service, so you'd only need a load balancer for JupyterHub.


Alternatively we could put ingress-nginx in front of everything. This allows routing raw tcp services as well as http services, which would allow putting everything behind the same hostname (see https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/). The routing would then be:

ingress-nginx, externally facing:
  - http
    - /hub/ -> configurable-http-proxy -> jupyterhub
    - /gateway/ -> dask-gateway-web-proxy -> gateway
  - tcp
    - port 8786 -> dask-gateway-scheduler-proxy

This removes the LoadBalancer services (only ingress-nginx needed), and puts everything through a single externally-facing server, which could be given a dns entry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could put ingress-nginx in front of everything.

After a bit of reading I believe we can do the same with recent versions of traefik (>= 2.0) as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. FYI, I think we're a little ways away from being able to do this. For now, just handling the within the same k8s cluster deployments is reasonable :)

tjcrone and others added 3 commits December 9, 2019 07:44
Whitelist all GitHub users for AGU
Gives us free HTTPS, and avoids having an extra public IP!
@yuvipanda
Copy link
Member

@jhamman I just added a commit that makes the gateway URL available at https://staging.hub.pangeo.io/services/dask-gateway. Takes care of HTTPS, and one less public URL :)

@jcrist
Copy link
Member

jcrist commented Dec 16, 2019

You might want to change the web-proxy service type to ClusterIp to drop the now unnecessary loadbalancer.

@yuvipanda
Copy link
Member

@jcrist makes sense. I can't find where that is set, though. I'm also curious - have you considered wrapping the TLS proxy with websockets? Would make routing / firewalling easier - many environments only allow outbound port 80/443, for example.

@jcrist
Copy link
Member

jcrist commented Dec 17, 2019

I can't find where that is set, though.

Apologies, I was on my phone when I wrote that. The helm chart is documented at: https://gateway.dask.org/install-kube.html#helm-chart-reference. To switch the web-proxy to use a ClusterIP instead of a LoadBalancer, the following should work:

webProxy:
  service:
    type: ClusterIP

@jcrist
Copy link
Member

jcrist commented Dec 17, 2019

I'm also curious - have you considered wrapping the TLS proxy with websockets? Would make routing / firewalling easier - many environments only allow outbound port 80/443, for example.

I haven't run into any systems where a non 80/443 port can't be opened, but you definitely have more experience here. We don't want to wrap our communications in a web socket - web sockets introduce an additional framing protocol inside a protocol that already handles framing, and thus add overhead. In either case, with traefik 2.0 (instead of the custom proxy I wrote) we can serve everything over the same port and traefik will handle separating HTTP requests from scheduler traffic for us.

@jhamman
Copy link
Member Author

jhamman commented Jan 9, 2020

I'm going to merge this and move the checklist to a new issue.

@jhamman jhamman merged commit b52b559 into pangeo-data:staging Jan 9, 2020
@jhamman jhamman mentioned this pull request Jan 9, 2020
8 tasks
rabernat added a commit that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants