This repo is a work in progress, containing customized kubernetes-based jupyterhub deployments. It is part of the DataScience@OregonState project, which aims to develop campus-scale infrastructure for teaching topics in data science.
Currently several components are specific to OSU infrastructure and in need of further testing and development, with beta tests ongoing.
The DS@OSU project included a thorough needs assessment process involving leadership from all Colleges, evaluation of current classroom "data science" softare use and pain points across campus, and continual input by both academic faculty and a technical advisory team. After careful review the Zero to JupyterHub with Kubernetes was chosen as a starting point, providing a balance in scalability, stability, flexibility, and the most pressing teaching needs (Jupyter, Python, R, Rstudio, Linux command-line).
Based on faculty and technical advisory feedback this project implements the following additional desired features:
-
Based on the datascience-noteook Jupyter Docker Stack, we support:
- JupyterLab, the latest-gen Jupyter interface
- Jupyter notebooks, Python3, Julia, R, RStudio, and R Shiny
- A wide array of pre-installed Python and R packages
-
For each hub, shared storage space with "classroom" permissions:
- Students can read+write in their own home directories
- Instructors (or other admins such as TAs) can directly browse and edit student data
- A
hub_data_share
for instructor staging of data
-
All users can install scripts and R and Python packages for their own use
-
Instructors can install scripts and R and Python packages for everyone
-
Additional hooks for instructors to customize all user environments
READMEs in this repo are intended for kubernetes administrators; end-user documents are maintained in the user_docs
folder (though more recent and complete documentation is available via an OSU Canvas studio site with integrated "playground" hub access).
Development assumes the local machine has installed kubectl
,
the local helm3
client, and docker
(on mac,
docker-desktop
), and kustomize
.
Using alias k=kubectl
in your .bashrc
is a great tip :)
I also recommend checking out krew
, a
plugin-manager for kubectl
. The konfig
plugin in particular makes managing kubeconfig files for different clusters
relatively painless.
The kubernetes cluster configurations in this repo target AWS EKS using the eksctl
utility; to work with AWS clusters
this way you'll need eksctl
, an AWS account, and the
aws
command-line utility with
credentials setup.
This README doesn't contain all the documentation - it's distributed within each subdirectory. Summarized here, in the order one might want to check out:
Helm charts, most of which are wrappers around official helm charts with additional site-specific configuration. Each
chart also has a scripts
subdirectory for last-mile configuration and deployment from settings files (in deployments
).
Information and configuration for Kubernetes clusters, ingress controller, cluster monitoring with prometheus and grafana, and cluster backup with velero.
Organized by cluster hostname, contains settings files for hub and cluster-tools deployments. Only an example subfolder is included in this git repo to keep running hub URLs private.
Docker image definitions. Note that subdirectories here have a special structure used by the build and push scripts (in
scripts
, below).
This directory contains kubernets .yaml manifest files and potentially other things, mostly used for testing or development where helm charts aren't a good fit.
The dev_singleuser_home_nfs
subdirectory in particular provides a workflow for fast development of JupyterHub
singleuser servers, avoiding the need to redeploy JupyterHub and use the spawner for testing.
Scripts for working with docker images (building and pushing to dockerhub), helm repos, and application/cluster deployment and teardown.
User-facing documentation.
Not documentation at all, instead, a Helm chart repository hosted by GitHub pages (using the docs
folder option for GitHub pages rather than the gh-pages
branch).
Enables the Jupyter single-user image to be deployed on BinderHub with this repo.