Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated dockerfiles #261

Merged
merged 14 commits into from
May 18, 2018
26 changes: 13 additions & 13 deletions gce/notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,16 @@ RUN apt-get update \
USER $NB_USER

RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
bokeh=0.12.15dev1 \
bokeh=0.12.16 \
cython \
cytoolz \
datashader \
dask=0.17.5 \
dask-ml \
distributed=1.21.8 \
fastparquet \
ipywidgets \
jupyterlab \
jupyterlab=0.32.1 \
jupyterlab_launcher=0.10.5 \
holoviews \
lz4 \
matplotlib \
Expand All @@ -25,27 +26,26 @@ RUN conda install --yes -c defaults -c ioam -c bokeh/channel/dev -c intake \
nomkl \
numba \
numcodecs \
numpy \
numpy=1.14.3 \
pandas \
python-blosc \
scipy \
scikit-image \
scikit-learn \
tornado \
xarray \
zict \
intake-xarray \
&& conda clean -tipsy

RUN pip install fusepy click jedi kubernetes --upgrade --no-cache-dir
RUN pip install --upgrade pip

RUN pip install daskernetes==0.1.3 \
dask-kubernetes \
git+https://github.com/zarr-developers/zarr \
git+https://github.com/pydata/xarray \
git+https://github.com/dask/gcsfs \
git+https://github.com/jupyterhub/nbserverproxy \
git+https://github.com/xgcm/xgcm \
RUN pip install fusepy click jedi kubernetes==4.0.0 dask-kubernetes s3fs \
gcsfs==0.1.0 zarr==2.2.0 xarray==0.10.4 \
nbserverproxy==0.8.1 \
--upgrade --no-cache-dir

RUN pip install git+https://github.com/xgcm/xgcm \
git+https://github.com/bokeh/datashader.git \
--no-cache-dir \
--upgrade

Expand Down
212 changes: 212 additions & 0 deletions gce/notebook/examples/cm26.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CM2.6 Ocean Model Analysis\n",
"\n",
"This notebook shows how to load and analyze ocean data from the GFDL [CM2.6](https://www.gfdl.noaa.gov/cm2-6/) high-resolution climate simulation.\n",
"\n",
"![CM2.6 SST](https://www.gfdl.noaa.gov/wp-content/uploads/ih/2012/06/cm2.6.png)\n",
"\n",
"Right now the only output available is the 5-day 3D fields of horizontal velocity, temperature, and salinity. We hope to add more going forward.\n",
"\n",
"Thanks to [Stephen Griffies](https://www.gfdl.noaa.gov/stephen-griffies-homepage/) for providing the data.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import numpy as np\n",
"import xarray as xr\n",
"import matplotlib.pyplot as plt\n",
"import holoviews as hv\n",
"import datashader\n",
"from holoviews.operation.datashader import regrid, shade, datashade\n",
"\n",
"hv.extension('bokeh', width=100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create and Connect to Dask Distributed Cluster\n",
"\n",
"This will launch a cluster of virtual machines in the cloud."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client, progress\n",
"from dask_kubernetes import KubeCluster\n",
"cluster = KubeCluster(n_workers=40)\n",
"cluster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"👆 Don't forget to click this link to get the cluster dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = Client(cluster)\n",
"client"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load CM 2.6 Data\n",
"\n",
"This data is stored in [xarray-zarr](http://xarray.pydata.org/en/latest/io.html#zarr) format in Google Cloud Storage.\n",
"This format is optimized for parallel distributed reads from within the cloud environment.\n",
"\n",
"It may take up to a minute to initialize the dataset when you run this cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#experiment = 'one_percent'\n",
"experiment = 'control'\n",
"\n",
"# Load with Cloud object storage\n",
"import gcsfs\n",
"gcsmap = gcsfs.mapping.GCSMap('pangeo-data/cm2.6/%s/temp_salt_u_v-5day_avg/' % experiment)\n",
"ds = xr.open_zarr(gcsmap, decode_cf=True, decode_times=False)\n",
"\n",
"# Print dataset\n",
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize Temperature Data with Holoviews and Datashader\n",
"\n",
"The cells below show how to interactively explore the dataset.\n",
"\n",
"_**Warning**: it takes ~10-20 seconds to render each image after moving the sliders. Please be patient. There is an open [github issue](https://github.com/bokeh/datashader/issues/598) about improving the performance of datashader with this sort of dataset._"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hv_ds = hv.Dataset(ds['temp'])\n",
"qm = hv_ds.to(hv.QuadMesh, kdims=[\"xt_ocean\", \"yt_ocean\"], dynamic=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%opts RGB [width=1000 height=600] \n",
"\n",
"# runs out of memory easily...change options at your own risk\n",
"datashade(qm, precompute=False, cmap=plt.cm.magma)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Make an Expensive Calculation\n",
"\n",
"Here we make a big reduction by taking the time and zonal mean of the temperature. This demonstrates how the cluster distributes the reads from storage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"temp_zonal_mean = ds.temp.mean(dim=('time', 'xt_ocean'))\n",
"temp_zonal_mean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Depending on the size of your cluster, this next cell will take a while. On a cluster of 40 workers, it took ~12 minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%time temp_zonal_mean.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig, ax = plt.subplots(figsize=(16,8))\n",
"temp_zonal_mean.plot.contourf(yincrease=False, levels=np.arange(-2,30))\n",
"plt.title('Naive Zonal Mean Temperature')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
2 changes: 1 addition & 1 deletion gce/notebook/worker-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
- 6GB
- --death-timeout
- '60'
image: pangeo/worker:2018-05-06
image: pangeo/worker:2018-05-17
name: dask-worker
securityContext:
capabilities:
Expand Down
15 changes: 6 additions & 9 deletions gce/worker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ RUN wget -O /usr/local/bin/dumb-init https://github.com/Yelp/dumb-init/releases/
RUN chmod +x /usr/local/bin/dumb-init

RUN conda install --yes -c conda-forge \
cython \
cytoolz \
dask=0.17.4 \
distributed=1.21.8 \
Expand All @@ -15,12 +16,11 @@ RUN conda install --yes -c conda-forge \
nomkl \
numba \
numcodecs \
numpy \
numpy=1.14.3 \
pandas \
python-blosc \
scikit-image \
scipy \
xarray \
zict \
&& conda clean -tipsy

Expand All @@ -29,14 +29,11 @@ RUN apt-get update \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN pip install pyasn1 click urllib3 --upgrade
RUN pip install --upgrade pip

RUN pip install git+https://github.com/zarr-developers/zarr \
git+https://github.com/pydata/xarray \
git+https://github.com/dask/gcsfs@f99177b31c44fcc404619b2876a77cdcda955a75 \
fusepy \
--no-cache-dir \
--upgrade
RUN pip install pyasn1 click urllib3 fusepy s3fs \
gcsfs==0.1.0 zarr==2.2.0 xarray==0.10.4 \
--upgrade --no-cache-dir

ENV OMP_NUM_THREADS=1
ENV DASK_TICK_MAXIMUM_DELAY=5s
Expand Down