Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying Hubs to Azure cluster for CarbonPlan #838

Merged
merged 8 commits into from
Nov 19, 2021

Conversation

sgibson91
Copy link
Member

@sgibson91 sgibson91 commented Nov 18, 2021

This is a first pass at hub config for CarbonPlan's Azure setup. It's a mish-mash of the Justice Innovation Lab config for Azure stuff, and the AWS CarbonPlan deployment for CarbonPlan stuff. I would appreciate early eyes on this to see if I've missed anything major.

I am particularly thinking about a couple of cases of serviceAccountName: cloud-user-sa that I missed out from from the original AWS CarbonPlan config. Mostly because I'm not sure what it is or what the Azure equivalent is. This isn't critical yet and can be addressed in other issues/PRs.

Note: This PR also reverts a commit that removed support for kubeconfig in the deployer as part of the work on AWS

@sgibson91 sgibson91 changed the title First pass at config for carbonplan hub on Azure Deploying Hubs to Azure cluster for CarbonPlan Nov 18, 2021
@sgibson91 sgibson91 self-assigned this Nov 18, 2021
@sgibson91 sgibson91 requested a review from yuvipanda November 18, 2021 15:16
@sgibson91 sgibson91 mentioned this pull request Nov 18, 2021
9 tasks
@yuvipanda
Copy link
Member

cloud-user-sa is defined in

name: "cloud-user-sa",
, and similar to what we were trying to do with config connector on GCP. It uses AWS IRSA to give code with a specific Kubernetes service account a specific AWS IAM role. In this case, it gives them full S3 access.

Not sure what the equivalent for Azure would be.

@sgibson91
Copy link
Member Author

it gives them full S3 access.

Is that basically read/write to some kind of storage?

@yuvipanda
Copy link
Member

Is that basically read/write to some kind of storage?

Yep, read-write access to object storage!

@sgibson91
Copy link
Member Author

Nice! I have a rough idea what we could use here - will do some Googling tomorrow :)

@jhamman
Copy link

jhamman commented Nov 18, 2021

Noting that the service account feature is going to be super nice but is not strictly required for an MVP deployment. If you want to set it aside for now, I think that would be fine for us.

@sgibson91
Copy link
Member Author

Wonderful, cheers @jhamman!

@sgibson91
Copy link
Member Author

I get this error when trying to use the deployer on kubeconfig type hubs, is that normal?

$ python deployer deploy-support carbonplan-azure
Traceback (most recent call last):
  File "/usr/local/Caskroom/miniconda/base/envs/pilot-hubs/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Caskroom/miniconda/base/envs/pilot-hubs/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/sgibson/source/github/2i2c-org/pilot-hubs/deployer/__main__.py", line 235, in <module>
    main()
  File "/Users/sgibson/source/github/2i2c-org/pilot-hubs/deployer/__main__.py", line 224, in main
    deploy_support(args.cluster_name)
  File "/Users/sgibson/source/github/2i2c-org/pilot-hubs/deployer/__main__.py", line 48, in deploy_support
    with cluster.auth():
  File "/usr/local/Caskroom/miniconda/base/envs/pilot-hubs/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/Users/sgibson/source/github/2i2c-org/pilot-hubs/deployer/hub.py", line 46, in auth
    raise ValueError(f'Provider {self.spec["provider"]} not supported')
ValueError: Provider kubeconfig not supported

@yuvipanda
Copy link
Member

oh, apparently @damianavila removed support for kubeconfig in 3759763.

@sgibson91
Copy link
Member Author

Ok, I'll revert that commit for now just to get this hub up

@sgibson91
Copy link
Member Author

Ok, we're rolling!

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 19, 2021

The api-staging-dask-gateway pod is in CrashLoopBackOff state with the following error implying that the POD_NAMESPACE environment variable is not being set.

[I 2021-11-19 11:22:31.834 DaskGateway] Starting dask-gateway-server - version 0.9.0
Traceback (most recent call last):
  File "/usr/local/bin/dask-gateway-server", line 33, in <module>
    sys.exit(load_entry_point('dask-gateway-server==0.9.0', 'console_scripts', 'dask-gateway-server')())
  File "/usr/local/lib/python3.8/site-packages/traitlets/config/application.py", line 663, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-6>", line 2, in initialize
  File "/usr/local/lib/python3.8/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/dask_gateway_server/app.py", line 170, in initialize
    self.load_config_file(self.config_file)
  File "<decorator-gen-5>", line 2, in load_config_file
  File "/usr/local/lib/python3.8/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/traitlets/config/application.py", line 601, in load_config_file
    for (config, filename) in self._load_config_files(filename, path=path, log=self.log,
  File "/usr/local/lib/python3.8/site-packages/traitlets/config/application.py", line 563, in _load_config_files
    config = loader.load_config()
  File "/usr/local/lib/python3.8/site-packages/traitlets/config/loader.py", line 457, in load_config
    self._read_file_as_dict()
  File "/usr/local/lib/python3.8/site-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict
    py3compat.execfile(conf_filename, namespace)
  File "/usr/local/lib/python3.8/site-packages/ipython_genutils/py3compat.py", line 198, in execfile
    exec(compiler(f.read(), fname, 'exec'), glob, loc)
  File "/etc/dask-gateway/dask_gateway_config.py", line 98, in <module>
    pod_namespace = os.environ['POD_NAMESPACE']
  File "/usr/local/lib/python3.8/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'POD_NAMESPACE'

This is a bit of config I pilfered from JIL to correctly setup the subdirs for the Azure File storage

@sgibson91
Copy link
Member Author

I guess that means this extra config isn't being applied

daskhub-01-add-dask-gateway-values: |
# 1. Sets `DASK_GATEWAY__PROXY_ADDRESS` in the singleuser environment.
# 2. Adds the URL for the Dask Gateway JupyterHub service.
import os
# These are set by jupyterhub.
release_name = os.environ["HELM_RELEASE_NAME"]
release_namespace = os.environ["POD_NAMESPACE"]
if "PROXY_HTTP_SERVICE_HOST" in os.environ:
# https is enabled, we want to use the internal http service.
gateway_address = "http://{}:{}/services/dask-gateway/".format(
os.environ["PROXY_HTTP_SERVICE_HOST"],
os.environ["PROXY_HTTP_SERVICE_PORT"],
)
print("Setting DASK_GATEWAY__ADDRESS {} from HTTP service".format(gateway_address))
else:
gateway_address = "http://proxy-public/services/dask-gateway"
print("Setting DASK_GATEWAY__ADDRESS {}".format(gateway_address))
# Internal address to connect to the Dask Gateway.
c.KubeSpawner.environment.setdefault("DASK_GATEWAY__ADDRESS", gateway_address)
# Internal address for the Dask Gateway proxy.
c.KubeSpawner.environment.setdefault("DASK_GATEWAY__PROXY_ADDRESS", "gateway://traefik-{}-dask-gateway.{}:80".format(release_name, release_namespace))
# Relative address for the dashboard link.
c.KubeSpawner.environment.setdefault("DASK_GATEWAY__PUBLIC_ADDRESS", "/services/dask-gateway/")
# Use JupyterHub to authenticate with Dask Gateway.
c.KubeSpawner.environment.setdefault("DASK_GATEWAY__AUTH__TYPE", "jupyterhub")
# Adds Dask Gateway as a JupyterHub service to make the gateway available at
# {HUB_URL}/services/dask-gateway
service_url = "http://traefik-{}-dask-gateway.{}".format(release_name, release_namespace)
for service in c.JupyterHub.services:
if service["name"] == "dask-gateway":
if not service.get("url", None):
print("Adding dask-gateway service URL")
service.setdefault("url", service_url)
break

@sgibson91
Copy link
Member Author

Seems like the above config is being applied but maybe hits some errors?

$ kubectl logs hub-5fb8864457-9sntn -c hub
Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: 01-working-dir
Loading extra config: 02-prometheus
Loading extra config: 03-no-setuid
Loading extra config: 04-custom-theme
Loading extra config: 05-custom-admin
Loading extra config: 06-cloud-storage-bucket
Loading extra config: daskhub-01-add-dask-gateway-values
Setting DASK_GATEWAY__ADDRESS http://proxy-public/services/dask-gateway
Adding dask-gateway service URL
[I 2021-11-19 11:21:02.006 JupyterHub app:2459] Running JupyterHub version 1.4.2
[I 2021-11-19 11:21:02.006 JupyterHub app:2489] Using Authenticator: oauthenticator.auth0.Auth0OAuthenticator-14.2.0
[I 2021-11-19 11:21:02.006 JupyterHub app:2489] Using Spawner: builtins.CustomSpawner
[I 2021-11-19 11:21:02.006 JupyterHub app:2489] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-1.4.2
[I 2021-11-19 11:21:02.122 JupyterHub provider:576] Updating oauth client service-configurator
[I 2021-11-19 11:21:02.154 JupyterHub provider:576] Updating oauth client service-dask-gateway
[I 2021-11-19 11:21:02.208 JupyterHub app:2526] Initialized 0 spawners in 0.003 seconds
[I 2021-11-19 11:21:02.211 JupyterHub app:2738] Not starting proxy
[I 2021-11-19 11:21:02.251 JupyterHub app:2774] Hub API listening on http://:8081/hub/
[I 2021-11-19 11:21:02.251 JupyterHub app:2776] Private Hub API connect url http://hub:8081/hub/
[I 2021-11-19 11:21:02.251 JupyterHub app:2789] Starting managed service cull-idle
[I 2021-11-19 11:21:02.251 JupyterHub service:339] Starting service 'cull-idle': ['python3', '-m', 'jupyterhub_idle_culler', '--url=http://localhost:8081/hub/api', '--timeout=3600', '--cull-every=600', '--concurrency=10']
[I 2021-11-19 11:21:02.253 JupyterHub service:121] Spawning python3 -m jupyterhub_idle_culler --url=http://localhost:8081/hub/api --timeout=3600 --cull-every=600 --concurrency=10
[I 2021-11-19 11:21:02.258 JupyterHub app:2789] Starting managed service configurator at http://configurator:10101
[I 2021-11-19 11:21:02.258 JupyterHub service:339] Starting service 'configurator': ['python3', '-m', 'jupyterhub_configurator.app', '--Configurator.config_file=/usr/local/etc/jupyterhub-configurator/jupyterhub_configurator_config.py']
[I 2021-11-19 11:21:02.260 JupyterHub service:121] Spawning python3 -m jupyterhub_configurator.app --Configurator.config_file=/usr/local/etc/jupyterhub-configurator/jupyterhub_configurator_config.py
[I 2021-11-19 11:21:02.556 JupyterHub log:189] 200 GET /hub/api/users (cull-idle@::1) 83.91ms
[I 2021-11-19 11:21:02.851 JupyterHub app:2798] Adding external service dask-gateway at http://traefik-staging-dask-gateway.staging
[W 2021-11-19 11:21:02.858 JupyterHub utils:221] Server at http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/ responded with error: 503
[W 2021-11-19 11:21:02.996 JupyterHub utils:221] Server at http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/ responded with error: 503
[W 2021-11-19 11:21:03.365 JupyterHub utils:221] Server at http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/ responded with error: 503
[W 2021-11-19 11:21:03.886 JupyterHub utils:221] Server at http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/ responded with error: 503
[W 2021-11-19 11:21:03.910 JupyterHub utils:221] Server at http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/ responded with error: 503
[W 2021-11-19 11:21:03.940 JupyterHub utils:221] Server at http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/ responded with error: 503
[E 2021-11-19 11:21:03.940 JupyterHub app:2825] Cannot connect to external service dask-gateway at http://traefik-staging-dask-gateway.staging. Is it running?
[I 2021-11-19 11:21:03.941 JupyterHub app:2798] Adding external service hub-health
[I 2021-11-19 11:21:03.943 JupyterHub proxy:347] Checking routes
[I 2021-11-19 11:21:03.943 JupyterHub proxy:432] Adding route for Hub: / => http://hub:8081
[W 2021-11-19 11:21:03.944 JupyterHub proxy:400] Adding missing route for configurator (Server(url=http://configurator:10101/services/configurator/, bind_url=http://configurator:10101/services/configurator/))
[W 2021-11-19 11:21:03.945 JupyterHub proxy:400] Adding missing route for dask-gateway (Server(url=http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/, bind_url=http://traefik-staging-dask-gateway.staging:80/services/dask-gateway/))
[I 2021-11-19 11:21:03.947 JupyterHub proxy:266] Adding service configurator to proxy /services/configurator/ => http://configurator:10101
[I 2021-11-19 11:21:03.948 JupyterHub proxy:266] Adding service dask-gateway to proxy /services/dask-gateway/ => http://traefik-staging-dask-gateway.staging:80
[I 2021-11-19 11:21:03.957 JupyterHub app:2849] JupyterHub is now running at http://:8000

@sgibson91
Copy link
Member Author

Solved this by moving the extraConfig for dynamic subpaths to the correct part of the hub config!

@sgibson91
Copy link
Member Author

Now we have a problem with scheduling dask workers though

Normal   NotTriggerScaleUp  20s   cluster-autoscaler  pod didn't trigger scale-up: 6 node(s) had taint {hub.jupyter.org_dedicated: user}, that the pod didn't tolerate, 1 node(s) didn't match Pod's node affinity

@sgibson91
Copy link
Member Author

Now we have a problem with scheduling dask workers though

Normal   NotTriggerScaleUp  20s   cluster-autoscaler  pod didn't trigger scale-up: 6 node(s) had taint {hub.jupyter.org_dedicated: user}, that the pod didn't tolerate, 1 node(s) didn't match Pod's node affinity

I think this is due to a bug in the azure terraform code that means we didn't actually create the dask pools - I will open a separate PR

@sgibson91
Copy link
Member Author

#839

@sgibson91 sgibson91 marked this pull request as ready for review November 19, 2021 15:57
@sgibson91
Copy link
Member Author

sgibson91 commented Nov 19, 2021

Ok, we now have a service running at https://staging.azure.carbonplan.2i2c.cloud! 🎉 If all looks good with this PR, I will deploy the prod hub and merge ✨

@sgibson91
Copy link
Member Author

sgibson91 commented Nov 19, 2021

For whatever reason, I can't get https://azure.carbonplan.2i2c.cloud to work - so I'm going to call it https://prod.azure.carbonplan.2i2c.cloud for now and come back to it

@sgibson91
Copy link
Member Author

Prod is alive! ✨ https://prod.azure.carbonplan.2i2c.cloud

@sgibson91 sgibson91 merged commit d3da0f3 into 2i2c-org:master Nov 19, 2021
@sgibson91 sgibson91 deleted the carbonplan-new-azure-hub branch November 19, 2021 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants