Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor GCP terraform code + add MOEM-IGE hub #429

Merged
merged 27 commits into from
Jun 24, 2021
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1eb3b86
Provision per-cluster support components
yuvipanda Jun 8, 2021
2361d51
Deploy support components from deployer script
yuvipanda Jun 9, 2021
11a393e
Add hub for MOEM-IGE group
yuvipanda May 25, 2021
119bdcf
Use g1-small instances for core nodes
yuvipanda May 27, 2021
ad19ba5
meom: Use JupyterLab as default interface
yuvipanda May 28, 2021
ecb25c3
Refactor GCP terraform code
yuvipanda May 30, 2021
a34361c
Split cluster terraform setup to its own file
yuvipanda Jun 3, 2021
2679ae8
Add a data & scratch GCS bucket
yuvipanda Jun 3, 2021
ad49f4e
Fix dask worker config defaulting to notebook node config
yuvipanda Jun 8, 2021
01699d3
Enable workload identity when config connector is enabled
yuvipanda Jun 8, 2021
7e6c5d1
Optimize autoscaler profile for batch workloads
yuvipanda Jun 8, 2021
02c4d6f
Document why we use cloud-platform access scope
yuvipanda Jun 9, 2021
6088bf9
Add warning about initial node count
yuvipanda Jun 14, 2021
e90bca8
Autogenreate Terraform variable documentation
yuvipanda Jun 14, 2021
bc4469c
Point RTD to our environment.yml file
yuvipanda Jun 15, 2021
4a43fd7
Add terraform conventions doc
yuvipanda Jun 15, 2021
16b66ff
Create GKE-specific cluster design docs
yuvipanda Jun 16, 2021
9ac2c05
Add comments on config connector / netpol settings
yuvipanda Jun 16, 2021
c77ef3e
Don't try to auto-built tf reference docs
yuvipanda Jun 16, 2021
ebdb697
Fix project SA description
yuvipanda Jun 16, 2021
70b39e8
Fix requirements.txt syntax
yuvipanda Jun 16, 2021
9248021
Add bigger instances for meom-ige
yuvipanda Jun 18, 2021
eaa8b71
Fix typo
yuvipanda Jun 23, 2021
c000c24
Fix typo
yuvipanda Jun 23, 2021
f04bc51
Revert "Don't try to auto-built tf reference docs"
yuvipanda Jun 24, 2021
e7242a9
Revert "Fix requirements.txt syntax"
yuvipanda Jun 24, 2021
f8cf3f7
Don't autorender tfdocs
yuvipanda Jun 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
vesion: 2
yuvipanda marked this conversation as resolved.
Show resolved Hide resolved

build:
image: latest

python:
version: 3.8
install:
requirements: docs/requirements.txt
10 changes: 10 additions & 0 deletions config/hubs/2i2c.cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ gcp:
project: two-eye-two-see
cluster: pilot-hubs-cluster
zone: us-central1-b
support:
config:
grafana:
ingress:
hosts:
- grafana.pilot.2i2c.cloud
tls:
- secretName: grafana-tls
hosts:
- grafana.pilot.2i2c.cloud
hubs:
- name: staging
domain: staging.pilot.2i2c.cloud
Expand Down
144 changes: 144 additions & 0 deletions config/hubs/meom-ige.cluster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
name: meom-ige
provider: gcp
gcp:
key: secrets/meom.json
project: meom-ige-cnrs
cluster: meom-ige-cluster
zone: us-central1-b
hubs:
- name: staging
domain: staging.meom-ige.2i2c.cloud
template: daskhub
auth0:
connection: github
config: &meomConfig
basehub:
nfsPVC:
nfs:
# from https://docs.aws.amazon.com/efs/latest/ug/mounting-fs-nfs-mount-settings.html
mountOptions:
- rsize=1048576
- wsize=1048576
- timeo=600
- soft # We pick soft over hard, so NFS lockups don't lead to hung processes
- retrans=2
- noresvport
serverIP: nfs-server-01
baseShareName: /export/home-01/homes/
jupyterhub:
custom:
homepage:
templateVars:
org:
name: "SWOT Ocean Pangeo Team"
logo_url: https://2i2c.org/media/logo.png
url: https://meom-group.github.io/
designed_by:
name: 2i2c
url: https://2i2c.org
operated_by:
name: 2i2c
url: https://2i2c.org
funded_by:
name: SWOT Ocean Pangeo Team
url: https://meom-group.github.io/
singleuser:
extraEnv:
DATA_BUCKET: gcs://meom-ige-data
SCRATCH_BUCKET: 'gcs://meom-ige-scratch/$(JUPYTERHUB_USER)'
profileList:
# The mem-guarantees are here so k8s doesn't schedule other pods
# on these nodes. They need to be just under total allocatable
# RAM on a node, not total node capacity
- display_name: "Small"
description: "~2 CPU, ~8G RAM"
kubespawner_override:
mem_limit: 8G
mem_guarantee: 5.5G
node_selector:
node.kubernetes.io/instance-type: e2-standard-2
- display_name: "Medium"
description: "~8 CPU, ~32G RAM"
kubespawner_override:
mem_limit: 32G
mem_guarantee: 25G
node_selector:
node.kubernetes.io/instance-type: e2-standard-8
- display_name: "Large"
description: "~16 CPU, ~64G RAM"
kubespawner_override:
mem_limit: 64G
mem_guarantee: 55G
node_selector:
node.kubernetes.io/instance-type: e2-standard-16
- display_name: "Very Large"
description: "~32 CPU, ~128G RAM"
kubespawner_override:
mem_limit: 128G
mem_guarantee: 115G
node_selector:
node.kubernetes.io/instance-type: e2-standard-32
- display_name: "Huge"
description: "~64 CPU, ~256G RAM"
kubespawner_override:
mem_limit: 256G
mem_guarantee: 230G
node_selector:
node.kubernetes.io/instance-type: n2-standard-64
defaultUrl: /lab
image:
name: pangeo/pangeo-notebook
tag: 2021.02.19
scheduling:
userPlaceholder:
enabled: false
replicas: 0
userScheduler:
enabled: false
proxy:
service:
type: LoadBalancer
https:
enabled: true
chp:
resources:
requests:
# FIXME: We want no guarantees here!!!
# This is lowest possible value
cpu: 0.01
memory: 1Mi
hub:
resources:
requests:
# FIXME: We want no guarantees here!!!
# This is lowest possible value
cpu: 0.01
memory: 1Mi
config:
Authenticator:
allowed_users: &users
- roxyboy
- lesommer
- auraoupa
- yuvipanda
- choldgraf
- GeorgianaElena
admin_users: *users

allowNamedServers: true
networkPolicy:
# FIXME: For dask gateway
enabled: false
readinessProbe:
enabled: false
dask-gateway:
extraConfig:
idle: |
# timeout after 30 minutes of inactivity
c.KubeClusterConfig.idle_timeout = 1800
- name: prod
domain: meom-ige.2i2c.cloud
template: daskhub
auth0:
connection: github
config: *meomConfig
22 changes: 22 additions & 0 deletions deployer/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,23 @@ def build(cluster_name):
cluster.build_image()


def deploy_support(cluster_name):
"""
Deploy support components to a cluster
"""

# Validate our config with JSON Schema first before continuing
validate(cluster_name)


config_file_path = Path(os.getcwd()) / "config/hubs" / f'{cluster_name}.cluster.yaml'
with open(config_file_path) as f:
cluster = Cluster(yaml.load(f))

if cluster.support:
with cluster.auth():
cluster.deploy_support()

def deploy(cluster_name, hub_name, skip_hub_health_test, config_path):
"""
Deploy one or more hubs in a given cluster
Expand Down Expand Up @@ -97,6 +114,7 @@ def main():
build_parser = subparsers.add_parser("build")
deploy_parser = subparsers.add_parser("deploy")
validate_parser = subparsers.add_parser("validate")
deploy_support_parser = subparsers.add_parser("deploy-support")

build_parser.add_argument("cluster_name")

Expand All @@ -107,6 +125,8 @@ def main():

validate_parser.add_argument("cluster_name")

deploy_support_parser.add_argument("cluster_name")

args = argparser.parse_args()

if args.action == "build":
Expand All @@ -115,6 +135,8 @@ def main():
deploy(args.cluster_name, args.hub_name, args.skip_hub_health_test, args.config_path)
elif args.action == 'validate':
validate(args.cluster_name)
elif args.action == 'deploy-support':
deploy_support(args.cluster_name)
else:
# Print help message and exit when no arguments are passed
# FIXME: Is there a better way to do this?
Expand Down
27 changes: 27 additions & 0 deletions deployer/hub.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def __init__(self, spec):
Hub(self, hub_yaml)
for hub_yaml in self.spec['hubs']
]
self.support = self.spec.get('support', {})

def build_image(self):
self.ensure_docker_credhelpers()
Expand Down Expand Up @@ -77,6 +78,32 @@ def ensure_docker_credhelpers(self):
with open(dockercfg_path, 'w') as f:
json.dump(config, f, indent=4)

def deploy_support(self):
cert_manager_version = 'v1.3.1'

print("Provisioning cert-manager...")
subprocess.check_call([
'helm', 'upgrade', '--install', '--create-namespace',
'--namespace', 'cert-manager',
'cert-manager', 'jetstack/cert-manager',
'--version', cert_manager_version,
'--set', 'installCRDs=true'
])
print("Done!")

print("Support charts...")

with tempfile.NamedTemporaryFile(mode='w') as f:
yaml.dump(self.support.get('config', {}), f)
f.flush()
subprocess.check_call([
'helm', 'upgrade', '--install', '--create-namespace',
'--namespace', 'support',
'support', 'support',
'-f', f.name,
'--wait'
])
print("Done!")

def auth_kubeconfig(self):
"""
Expand Down
84 changes: 44 additions & 40 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,43 +62,47 @@
from yaml import safe_load
import pandas as pd
from pathlib import Path

# Grab the latest list of clusters defined in pilot-hubs/
clusters = Path("../config/hubs").glob("*")
# Add list of repos managed outside pilot-hubs
hub_list = [{
'name': 'University of Toronto',
'domain': 'jupyter.utoronto.ca',
'id': 'utoronto',
'template': 'base-hub ([deployment repo](https://github.com/utoronto-2i2c/jupyterhub-deploy/))'
}]
for cluster_info in clusters:
if "schema" in cluster_info.name:
continue
# For each cluster, grab it's YAML w/ the config for each hub
yaml = cluster_info.read_text()
cluster = safe_load(yaml)

# For each hub in cluster, grab its metadata and add it to the list
for hub in cluster['hubs']:
config = hub['config']
# Config is sometimes nested
if 'basehub' in config:
hub_config = config['basehub']['jupyterhub']
else:
hub_config = config['jupyterhub']
# Domain can be a list
if isinstance(hub['domain'], list):
hub['domain'] = hub['domain'][0]

hub_list.append({
'name': hub_config['custom']['homepage']['templateVars']['org']['name'],
'domain': f"[{hub['domain']}](https://{hub['domain']})",
"id": hub['name'],
"template": hub['template'],
})
df = pd.DataFrame(hub_list)
path_tmp = Path("tmp")
path_tmp.mkdir(exist_ok=True)
path_table = path_tmp / "hub-table.csv"
df.to_csv(path_table, index=None)
import subprocess

def render_hubs():
# Grab the latest list of clusters defined in pilot-hubs/
clusters = Path("../config/hubs").glob("*")
# Add list of repos managed outside pilot-hubs
hub_list = [{
'name': 'University of Toronto',
'domain': 'jupyter.utoronto.ca',
'id': 'utoronto',
'template': 'base-hub ([deployment repo](https://github.com/utoronto-2i2c/jupyterhub-deploy/))'
}]
for cluster_info in clusters:
if "schema" in cluster_info.name:
continue
# For each cluster, grab it's YAML w/ the config for each hub
yaml = cluster_info.read_text()
cluster = safe_load(yaml)

# For each hub in cluster, grab its metadata and add it to the list
for hub in cluster['hubs']:
config = hub['config']
# Config is sometimes nested
if 'basehub' in config:
hub_config = config['basehub']['jupyterhub']
else:
hub_config = config['jupyterhub']
# Domain can be a list
if isinstance(hub['domain'], list):
hub['domain'] = hub['domain'][0]

hub_list.append({
'name': hub_config['custom']['homepage']['templateVars']['org']['name'],
'domain': f"[{hub['domain']}](https://{hub['domain']})",
"id": hub['name'],
"template": hub['template'],
})
df = pd.DataFrame(hub_list)
path_tmp = Path("tmp")
path_tmp.mkdir(exist_ok=True)
path_table = path_tmp / "hub-table.csv"
df.to_csv(path_table, index=None)

render_hubs()
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ Topic guides go more in-depth on a particular topic.
topic/config.md
topic/hub-templates.md
topic/storage-layer.md
topic/terraform.md
topic/cluster-design.md
```

## Reference
Expand Down
Loading