Custom configs for gateway clusters #481

jhamman · 2019-11-27T19:47:00Z

I'd like to use this branch to collect the initial customizations for the dask-gateway clusters that went live yesterday. A few things to check:

gateway clusters use image: ${JUPYTER_IMAGE_SPEC}
setup options for cluster specs (memory, cpu, etc.)
make sure gateway ips are static on GCP
https for endpoints
figure out how to make gateway clusters work with dask-labextension
set defaults for autoscaling and automatic cluster shutdown
set user limits for cpu/memory/cluster
make sure dask clusters are putting pods on correct nodes (check taints/affinities/etc)

cc @TomAugspurger

staging -> prod (#447)

staging -> prod (#450, #451) [ocean only]

staging -> prod (#453) [ocean only]

staging -> prod (#457, #458)

staging -> prod

staging -> prod (#468,#469,#470)

staging -> prod (#473, #474, #475)

…ion into gateway

TomAugspurger · 2019-12-04T17:34:08Z

figure out how to make gateway clusters work with dask-labextension

I think this is fixed on dask-labextension master, and Jacob is planning to do a release today dask/dask-labextension#93

deployments/dev/image/binder/Dockerfile

TomAugspurger · 2019-12-06T12:11:42Z

deployments/dev/image/binder/dask_config.yaml

@@ -43,3 +43,9 @@ labextension:
    class: KubeCluster
    args: []
    kwargs: {}
+
+gateway:
+  address: http://34.68.195.134


Can you document somewhere how we ensure that these addresses are stable?

And, I may be wrong, but this feels like something that dask-gateway / kubernetes should be able to do for us. I'm reading through https://kubernetes.io/docs/concepts/services-networking/service/#dns.

This Gateway is in the same kubernetes deployement, so would gateway-api-dev-staging-dask-gateway.dev-staging find it? cc @jcrist if you have thoughts here.

In general though, we won't be able to rely on that I think. That would only potentially work when the Client is running on a machine within the same kubernetes deployement.

Actually, maybe this is working?

g = Gateway( address="http://web-public-dev-staging-dask-gateway", proxy_address="tls://scheduler-public-dev-staging-dask-gateway", auth="jupyterhub" ) g.list_clusters() []

Is that just a coincidence? I'm not sure why the .namespace of dev-staging isn't needed.

I'll defer to @jcrist on the k8s dns functionality. If we want the public ips to be stable, we need to reserve them with GCP: https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address

Is it worth mapping these to pangeo dns entries (e.g. gateway.staging.hub.pangeo.io)? Would this help make these more usable outside this jhub?

If you're connecting to the gateway from inside the same k8s cluster, then:

gateway: address: https://web-public-{name} proxy-address: https://scheduler-public-{name} auth: type: jupyterhub

will work, where name is the deployment name. I think if it's deployed in the default namespace then .{namespace} isn't part of the name.

If you're trying to connect from outside the cluster, then you'll want to expose a stable address to external clients. I'm not sure the best way to handle that.

For now, I think using the k8s dns stuff to just use the names rather than IPs makes sense.

In the future I suspect we'll have multiple gateways, some of which will be deployed to different k8s clusters. In that case we'll need to use the cloud provider's service for stable IPs / get a DNS entry.

@jcrist does having some config / API for discovering multiple gateways make sense? This would be some kind of global name: c.Gateway mapping. Someday we may want users connecting to this hub to be able to connect to a Gateway deployed on AWS, or perhaps GCP but in a different region. I haven't thought about it much.

does having some config / API for discovering multiple gateways make sense?

Hmmm, maybe? I'd have to think about what the user-facing api would look like.

For now I agree that you should focus on in-cluster users.

One thing we could do is add dask-gateway behind the Jupyterhub proxy as a service (JupyterHub services may be proxied, but aren't required to be). Then the web api would be available at https://jupyterhub-address/services/dask-gateway/. This simplifies the gateway.address field at least. Then you'd only need special handling for the scheduler-proxy. If you only want users internal to the cluster, you could make the scheduler-proxy a ClusterIP service, so you'd only need a load balancer for JupyterHub.

Alternatively we could put ingress-nginx in front of everything. This allows routing raw tcp services as well as http services, which would allow putting everything behind the same hostname (see https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/). The routing would then be:

ingress-nginx, externally facing: - http - /hub/ -> configurable-http-proxy -> jupyterhub - /gateway/ -> dask-gateway-web-proxy -> gateway - tcp - port 8786 -> dask-gateway-scheduler-proxy

This removes the LoadBalancer services (only ingress-nginx needed), and puts everything through a single externally-facing server, which could be given a dns entry.

Alternatively we could put ingress-nginx in front of everything.

After a bit of reading I believe we can do the same with recent versions of traefik (>= 2.0) as well.

Thanks. FYI, I think we're a little ways away from being able to do this. For now, just handling the within the same k8s cluster deployments is reasonable :)

Whitelist all GitHub users for AGU

Gives us free HTTPS, and avoids having an extra public IP!

yuvipanda · 2019-12-16T23:11:43Z

@jhamman I just added a commit that makes the gateway URL available at https://staging.hub.pangeo.io/services/dask-gateway. Takes care of HTTPS, and one less public URL :)

jcrist · 2019-12-16T23:22:11Z

You might want to change the web-proxy service type to ClusterIp to drop the now unnecessary loadbalancer.

yuvipanda · 2019-12-16T23:43:23Z

@jcrist makes sense. I can't find where that is set, though. I'm also curious - have you considered wrapping the TLS proxy with websockets? Would make routing / firewalling easier - many environments only allow outbound port 80/443, for example.

jcrist · 2019-12-17T16:14:19Z

I can't find where that is set, though.

Apologies, I was on my phone when I wrote that. The helm chart is documented at: https://gateway.dask.org/install-kube.html#helm-chart-reference. To switch the web-proxy to use a ClusterIP instead of a LoadBalancer, the following should work:

webProxy:
  service:
    type: ClusterIP

jcrist · 2019-12-17T16:18:48Z

I'm also curious - have you considered wrapping the TLS proxy with websockets? Would make routing / firewalling easier - many environments only allow outbound port 80/443, for example.

I haven't run into any systems where a non 80/443 port can't be opened, but you definitely have more experience here. We don't want to wrap our communications in a web socket - web sockets introduce an additional framing protocol inside a protocol that already handles framing, and thus add overhead. In either case, with traefik 2.0 (instead of the custom proxy I wrote) we can serve everything over the same port and traefik will handle separating HTTP requests from scheduler traffic for us.

…ion into gateway

…into gateway

… into gateway

jhamman · 2020-01-09T19:11:29Z

I'm going to merge this and move the checklist to a new issue.

staging -> prod (#477, #478, #480, #481, #482, #484, #486, #493, #494)

Joe Hamman and others added 11 commits October 14, 2019 21:14

Merge pull request #449 from pangeo-data/staging

bd73720

staging -> prod (#447)

Merge pull request #452 from pangeo-data/staging

5ebdb6e

staging -> prod (#450, #451) [ocean only]

Merge pull request #454 from pangeo-data/staging

230eda7

staging -> prod (#453) [ocean only]

Merge pull request #459 from pangeo-data/staging

88c3146

staging -> prod (#457, #458)

Merge pull request #463 from pangeo-data/staging

5f7001c

staging -> prod

Merge pull request #466 from pangeo-data/staging

a65efa5

staging -> prod

Merge pull request #471 from pangeo-data/staging

f0b0d0d

staging -> prod (#468,#469,#470)

Merge pull request #476 from pangeo-data/staging

18884f9

staging -> prod (#473, #474, #475)

start of config for gateway clusters

ecbece7

bump chart version

6c83b27

Merge branch 'staging' of github.com:pangeo-data/pangeo-cloud-federat…

8be6e38

…ion into gateway

bump base notebook on dev dep

cb28ced

tjcrone reviewed Dec 5, 2019

View reviewed changes

deployments/dev/image/binder/Dockerfile Outdated Show resolved Hide resolved

jhamman mentioned this pull request Dec 5, 2019

prod <- staging #487

Closed

Update Dockerfile

2b35660

TomAugspurger reviewed Dec 6, 2019

View reviewed changes

tjcrone and others added 3 commits December 9, 2019 07:44

Whitelist all GitHub users for AGU

65d7e77

Merge pull request #488 from ooicloud/prod

8c824b6

Whitelist all GitHub users for AGU

Make dask-gateway available under ${HUB_URL}/dask-gateway

b6bc949

Gives us free HTTPS, and avoids having an extra public IP!

Joseph Hamman added 4 commits January 9, 2020 10:39

Merge branch 'staging' of github.com:pangeo-data/pangeo-cloud-federat…

eb478f4

…ion into gateway

add gateway secrets for prod

3f9b1fe

Merge branch 'gateway' of github.com:jhamman/pangeo-cloud-federation …

1b21b3c

…into gateway

Merge branch 'prod' of github.com:pangeo-data/pangeo-cloud-federation…

b62672e

… into gateway

jhamman merged commit b52b559 into pangeo-data:staging Jan 9, 2020

jhamman mentioned this pull request Jan 9, 2020

Dask-gateway checklist #496

Closed

8 tasks

rabernat added a commit that referenced this pull request Jan 10, 2020

Merge pull request #495 from pangeo-data/staging

74d01ab

staging -> prod (#477, #478, #480, #481, #482, #484, #486, #493, #494)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom configs for gateway clusters #481

Custom configs for gateway clusters #481

jhamman commented Nov 27, 2019 •

edited

Loading

TomAugspurger commented Dec 4, 2019

TomAugspurger Dec 6, 2019

TomAugspurger Dec 6, 2019

jhamman Dec 6, 2019

jcrist Dec 6, 2019

TomAugspurger Dec 6, 2019

jcrist Dec 6, 2019

jcrist Dec 6, 2019 •

edited

Loading

jcrist Dec 6, 2019

TomAugspurger Dec 6, 2019

yuvipanda commented Dec 16, 2019

jcrist commented Dec 16, 2019

yuvipanda commented Dec 16, 2019

jcrist commented Dec 17, 2019

jcrist commented Dec 17, 2019

jhamman commented Jan 9, 2020

Custom configs for gateway clusters #481

Custom configs for gateway clusters #481

Conversation

jhamman commented Nov 27, 2019 • edited Loading

TomAugspurger commented Dec 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcrist Dec 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuvipanda commented Dec 16, 2019

jcrist commented Dec 16, 2019

yuvipanda commented Dec 16, 2019

jcrist commented Dec 17, 2019

jcrist commented Dec 17, 2019

jhamman commented Jan 9, 2020

jhamman commented Nov 27, 2019 •

edited

Loading

jcrist Dec 6, 2019 •

edited

Loading